SDMX Information Model

SDMX Information Model

Introduction

System awareness of the SDMX Information Model, with components to support relevant aspects of this model, will reduce dramatically both the resources required and the timeline, to “go live”. The Information Model supports many use cases that form a part of the statistical lifecycle.

Data and Metadata Structure Definitions

Data and Metadata are structured and this structure is defined in a structure definition (e.g. dimensionality, coding schemes, metadata attributes etc.). This structure definition is pivotal to most applications using or accessing SDMX data and metadata sources, such as data sets or databases. With this structure definition it is possible to build components that can:

  • Import and export SDMX formatted data or reference metadata to/from a database or file
  • Read and write SDMX formatted files
  • Create database tables
  • Load databases (consume SDMX data sets and metadata sets)
  • Query databases and metadata repositories
  • Bring data and metadata together for dissemination or viewing
  • Visualize data in tables, graphs, maps, and charts

Data and Metadataflows and Categories



The Data and Metadata flows are the vital link between the structures that are used, and the process of reporting, collection, and dissemination. The Dataflow represents in a concrete form the type of data that is to be reported or disseminated from a process perspective, and the technical rules (which DSD specifies the structure and any Constraints (see later) that may apply) and business rules (which organisations known as “Data Providers” report this data). Note that in SDMX a Data Set is a container for data –it exists at a moment in time when the data is reported or disseminated. Its link to the structural metadata in the Information Model is the Dataflow. A similar mechanism exists for the Metadataflow.

These flows can themselves be categorized many ways, and in particular by a subject domain category scheme. These schemes contain broad subject categories that can be hierarchical and are used to assist users to find data by means of a drill-down approach. An example is shown below. This is a visualisation of the Category Scheme returned from the following REST call.

http://sdw-ws.ecb.europa.eu/CategoryScheme/ALL/

Data Providers and Data Registration

The flows can be linked to the Data Providers that report the data or metadata. These Data Providers are maintained in an Organisation Scheme. Clearly one Data Provider can report data or metadata for many data or metadata flows: typically a statistical agency will report many types of statistic (labour, population, education, national accounts etc.). On the other hand an organization collecting data will typically have many Data Providers for any one flow of data or metadata. SDMX has the Provision Agreement which links the flow to the Data Provider – one Data Provider linked to one Dataflow or Metadataflow is one Provision Agreement.

The Provision Agreement is an important object because not only does it link the real world of data reporting with the administrative world of data categorisation and structure, it is also linked to vital metadata that enables user applications to find data by means of registered data and metadata sources. The reporters or publishers of data and metadata can register the existence of the data or metadata which automatically links it to a Dataflow and thereby to a category and structure.

Constraints

Constraints are metadata about data/metadata provisioning and data/metadata artefacts defined above (Data Providers, Dataflows, Provision Agreements, DSD MSD) which restrict and define these artefacts. Specification of constraints gives enhanced semantics to data provisioning and data/metadata structure artefacts, enabling more automated processing of the “information supply chain”.

Constraints comprise the specification of subsets of key or target values or attribute values that are contained in a Datasource, or is to be provided for a Dataflow or Metadataflow, or directly attached to a DSD or MSD. This is important metadata because, for example, the full range of possibilities which is implied by the DSD (e.g. the complete set of valid keys is the Cartesian product of all the values in the code lists for each of the Dimensions) is often more than is actually present in any specific Datasource, or more than is intended to be supplied according to a specific Dataflow.

Often a Data Provider will not be able to provide data for all key combinations, either because the combination itself is not meaningful, or simply that the provider does not have the data for that combination. In this case the Data Provider could constrain the Datasource (at the level of the Provision Agreement or the Data Provider) by supplying metadata that defines the key combinations or cube regions that are available. This is done by means of a Constraint.

Furthermore, it is often useful to define subsets or views of the DSD which restrict values in some code lists, especially where many such subsets restrict the same DSD. Such a view is specified by applying a Constraint to a Dataflow, and there can be many Dataflows that use the same DSD.

Whenever data is published or made available by a Data Provider, it must conform to a Dataflow (and hence to a DSD). The Dataflow is thus a means of enabling content based processing.

Constraints can be extremely useful in a data visualisation system, such as dissemination of statistics on a website. In such a system a Cube Region can be used to specify the Dimension codes that actually exist in a datasource (these can be used to build relevant selection tables), and the Key Set can be used to specify the keys that exist in a datasource (these can be used to guide the user to select only those Dimension code values that will return data based on the Dimension values already selected).

The example below shows the Currencies available for selection when the Exchange Rate Type chosen is “Spot”. You will note that various Euro based currency indicators are not available for selection as the Spot rate is not relevant for these currency indicators.

Data and Metadata Discovery

User and applications can discover the existence of data by querying a repository of structural information such as an SDMX Registry, 1 drill down via categories, select a dataflow (which is, for dissemination system, can be represented as lower level of category), access the structure, make a refined query based on dimension selections (e.g. sub set of dimension values e.g. countries, age range, sex, etc.), and 2 be informed which data or metadata sources have the information required, and then, without leaving the query application, 3 direct the query to the database containing the information.

Registered data and metadata sources can also be used by applications that automate the reporting process: instead of data and metadata being pushed, its existence is registered and these applications can then pull the data into their systems.

The Information Model in Action

The examples below show how a real system has implemented the SDMX standards to provide query and data visualisation facilities for any database that can support the SDMX query messages. This example covers many of the use cases described above.

In the demonstration below we have queried the ECB SDMX Web Service and replicated a sub set of the structural metadata and data in order to simulate the whole process of loading new data into a data dissemination system and for querying this.

In the demonstration we have queried for the Category Scheme and linked Dataflows (this is a “list” of data “topics” to enable a user to ‘drill down” to find the data), and the Exchange Rates DSD from the ECB SDMX Web Service. The DSD is used to create a Database using the Database Support Tool. These structural metadata are also stored in an SDMX Registry. Then we have queried the ECB SDMX Web Service to extract a sub set of the data and saved this to the database. This database, which also supports the SDMX query is then used to provide selections that use the Constraints engine in the Database Support Tool.

The structural metadata queried from the ECB Web Service is depicted schematically below.

This is the start point of this process flow which is depicted below

  1. Create Database TablesThe database tables can be created automatically directly from a Data Structure Definition (DSD) as this contains all of the information required to create the database tables that will enable efficient data loading and data query. Click here to see a graph of performance times with very large volumes of data <<link to be put here>>.
  2. Load Database

     

    Whilst the Database Support Tool can run automatically in the background, here it is invoked from an add-in on the Metadata Technology Registry.

    In this tool the database tables will be created prior to loading the data set if the tables do not exist. These tables are created from the information in the DSD (in SDMX this is called structural metadata).

  3. Data Discovery: this is achieved by a user system interacting with an SDMX compliant queryable resource such as an SDMX Registry to retrieve structural metadata. The first query on the Registry is retrieve the Category Scheme and the links to any Dataflow (linked by Categorisation in the Information Model). These are used to present the user with selection choices as shown below.

    The user selects the Dataflow (Exchange Rates) and the Registry is queried for the DSD associated with this Dataflow. The contents of the DSD is displayed to the user as selection screens.

    Selections are made from the code list for each of the Dimensions. Constraints are used to blank any code value which will not return any data as there is no data for that Dimension value based on the values already selected for the other Dimensions. This is a very useful feature as it guides the user to make selections that will return data.

  4. Query databaseBased on the user selections an SDMX-ML query is built and submitted to the web service for the database. The database responds with an SDMX-ML data set.
  5. Visualise dataThe data set, combined with the DSD structural metadata (which contains, for example, the code labels associated with the Dimension values that form the key of the observations), enables an application to render data as graphs, charts, tables, and maps in a meaningful way.

Summary

This whole system will work with any SDMX compliant database and structural metadata resource. Often the two resources are combined into a single web service. This means that a user can discover data and view the data from a variety of data sources using a single user interface for query selections and visualisation.

Powered by WishList Member - Membership Site Software