SDMX

Statistical Data and Metadata Exchange (SDMX) is a global standard, currently in three backwardly compatible versions: 1.0, 2.0 and 2.1. Version 2.1 is at the end of its public review stage as of April 2011. SDMX is governed in ISO under Working Group 2 of Technical Committee 154 (ISO TC/154). Note that the version 1.0 of SDMX is ISO/TS 17369. The standard is published at SDMX.org.

Problem Space

  • The collection, processing, and exchange of statistical data is a time-consuming and resource-intensive process.
  • There is a growing demand for, and willingness of government and other organisations to provide, public access to public data.
  • Various international and national organisations are responding to these challenges and at the same time are seeking to gain benefits from interoperability and cross-system coherence.

Resolution Premise

The Statistical Data and Metadata Exchange (SDMX) initiative addresses these challenges and opportunities within the Problem Space:

  • By focusing on business practices in the field of statistical information.
  • By identifying more efficient processes for exchange and sharing of data and metadata using modern technology.

Sponsors

SDMX is an initiative sponsored by seven international organizations:

  • Bank for International Settlements
  • European Central Bank
  • Eurostat
  • International Monetary Fund
  • Organisation for Economic Cooperation and Development
  • United Nations
  • World Bank

The initiative was launched in 2001

Websites Using SDMX

Sites Using SDMX Web Services

Endorsments/Recommentations

SDMX has been officially recommended:

  • February 2007: SDMX endorsed by the European Union’s Statistical Programme Committee (now European Statistical System Committee)
  • March 2008: UN Statistical Commission declares SDMX to be the preferred standard for data and metadata exchange

Adopters

The following are known adopters:

  • US Federal Reserve Board
  • Federal Reserve Bank of New York
  • European Central Bank and the European System of Central Banks
  • Joint External Debt Hub (WB, IMF, OECD, BIS)
  • UN/TRADECOM at UN Statistical Division
  • European Statistical System (Eurostat and National Statistical Institutes of the EU)
  • Mexican Federal System
  • IMF
  • Food and Agriculture Organization of the UN
  • Millennium Development Goals (UN System, others)
  • Bank for International Settlements
  • OECD
  • World Bank
  • UNESCO (Education)
  • Australian Bureau of Statistics
  • International Energy Agency
  • World Health Organization
  • There are many others

For Data Consumers

Imagine being able to find, query, and view data direct from the databases of the data producers using a single software tool. No more the need to access each website independently, use their navigation and query mechanism, and the viewing options provided. Use a single tool that assists you to find the data, then allows you to query the database of the producer, and then to view the data as graphs, charts and tables (often some of these searching and viewing options are not available on the producer’s website).

Too good to be true? Not so with SDMX. Many organisations such as the IMF, ECB, OECD have launched web services which can be queried using SDMX standards. For a demonstration of this click here

For Data Collectors

Imagine being able to eliminate the many and varied formats in which data and reference data are reported. Imagine being able to immediately load the reported data into a database so that  it can be viewed in a meaningful way to assist the validation and cleaning stage. Imagine sharing the data and reference metadata collection with your statistical partners so that they collect from their community and you collect from yours, and then you share the data.

This can all be achieved by describing the data and reference metadata structures in SDMX and using the SDMX data and reference metadata formats for reporting and exchange.

For Data Reporters

Imagine being able to report the same data in the same format to all of the organisations that require your data. Even better, that you do not send the data at all. After all, the World Wide Web works on the principle of publishing and discovery. Interestingly, SDMX also supports this paradigm: make the data available either as an SDMX file available at a URL or as web service that can be queried, publish this fact in an SDMX Registry, and then interested parties can discover the data, or be informed automatically by the SDMX  Subscription/Notification service.

For Data Publishers

Imagine being able to build a web dissemination service that can support any type of data and refernce metadata, and bring these together in a dissemination environment. Imagine being able to provide this without needing to change any software when new data are added. Imagine being able automatically to create and load the database that is used for the web dissemination service. Imagine being able to enable this database to be queried directly from applications outside of the organisation so that data is available to all.

All this is possible and extremely practical using SDMX. In fact, web dissemination is an ever-increasingly popular use of the SDMX standards because with the SDMX structural metadata such as a Data Structure Definition, SDMX Web Service standards, and SDMX standard data and reference metadata formats, such systems are easy to build and have a very low maintenance effort.

Exchange Patterns

The diagrams below show the three main models of data exchange: Bilateral, Gateway and Data-Sharing. As you can see the bilateral, even with standard formats, demands a lot of co-ordination, but often there are no standard formats and this increases the complexity and resources required which can adversly affect the timeliness to publication.  The other two models make for more efficient and timely data and reference metadata reporting and can only work with well defined standards.

Bilateral Exchange

SDMX

Institutions exchange data according to bilateral agreements regarding format, timing, protocols, etc.

This is a poor method because it is inefficient (the large number of arrows) and agreements are not necessarily widely or consistently adopted.

Gateway Exchange

SDMX

Institutions share the data they collect with their peers, in agreed formats among counterparty communities.

It is easier to harmonize codes, structures and meanings in this co-operative arrangement.

Data-Sharing Exchange

SDMX

This model demands standardised formats and protocols.

SDMX has the concept of a Registry/Repository (the middle barrel), where structures and other definitions are submitted and maintained. These are accessible to any user (subject to the access control mechansim of the Registry)  and can be referenced and shared. Data providers/reporters on the left can submit content or register data and reference metadata sources, whilst the data consumers and data collecters on the right can obtain notifications of new content and changes to exisiing content. Noe that the Registry does not contain any actual data set or metadata set – just a way for reporting applications to register these and for consuming application to discover them.


Glossary of Key Terms

The following are the main terms that either map to elements of the SDMX Model or relate to the use and understanding of the model. Note that whilst care has been taken to ensure that the descripton of these terms do not conflict with the same term in the SDMX Metadata Common Vocabulary (MCV), the description here may (intentionally) differ from the description in the MCV.

  • Observation Value

    This is the value of the observed phenomenum as computed during statistical data compilation process. This could be a population count, a cost, indeed any kind of measurement.

  • Dimension

    A statistical concept used (most probably together with other statistical concepts) to identify a statistical series, such as a time series, e.g. a statistical concept indicating a certain economic activity or a geographical reference area. Observation Values are qualified by dimensions. Dimensions identify the Observation Value. For example, if it is said that on the “2011 Census, the population of working men in UK is 23 million”, then this might imply the dimensions are: time (2011), type (population), status (working), sex (men), country (UK). In visual terms these are often  represented by the axes of a graph or table of values.

  • Key

    A single Observation Value is identified by a specific value for each of the Dimensions, a co-ordinate in the Cube, and together these values are called the Key of the Observation Value.

  • Data Cube or Multi-Dimensional, and Sparse Cube

    A Cube is usually a sub set of the full contents of available data whose (multi) dimensionality is defined by the Data Structure Definition. Whilst “cube” may imply three dimensions it is really “multi dimensional”).  Sometimes large portions of this Cube can be empty as there is no data for specific keys – this is known as a Sparsely Populated Cube.

  • Time Series/Cross-sectional

    Data comprises a set of ordered observations on a quantitative characteristic of an individual or collective phenomenon, usually taken at different points of time. If the ordered set of observations is repesented according to time then it is known as Time Series. If it is represented according the value of another Dimension (such as Country) then is often referrred to as Cross-sectional.

  • CodeList

    A list from which some statistical concepts (coded concepts) take their values when used in a structure, such as a Data Structure Definition. For example the Dimension “sex” may use a code list with codes representing a language independent identity for each of  “Male” and “Female”.

  • Concept

    A concept is a unit of knowledge created by a unique combination of characteristics. In SDMX the “concept” is used as a building block for the data and metadata structure definitions. For example, the Concept of “country” can be used as a Dimension to give context and meaning to that Dimension. Concepts are maintained (in Concept Schemes) and are independent of of any role that it may play in a structure definition. Thus a Concept can be used by other Dimensions in other Data Structure Definitions.

  • Attribute

    Attributes are additional metadata about Observation Values, such as Unit and Unit Multiplier. They generally include anything that isn’t a dimension but that adds to the context or interpretation of Observation Values.

  • Data Structure Definition (also know as KeyFamily)

    A Data Structure Definition is a logical description of the structure of a Cube. This description includes  Dimensions, Attributes and  Measures – all of which can be coded, data/time, text, number, etc presentations. The Data Structure Definition enables data (in a data set) to be understood and tabulated.

  • Dataset

    A Dataset is an organised collection of data and associated metadata structured according to a known Data Structure Definition. It is the contents or some sub-contents of a Cube.

  • Metadata

    Metadata defines and describes other data. In SDMX there are two types of metadata: structural metadata defines things that are required to make the technical standards work, such as Dimension, Code List, Data Provider; reference metadata describes the contents and the quality of the statistical data.

  • Agency

    An agency (maintenance agency) is an organisation or other expert body that maintains SDMX structural metadata.  In SDMX there can be a hierarchy of agencies, allowing an SDMX-recognised agency to itself adminster its own agency schemes. All SDMX constructs are “owned” by an agency, and only that agency can amend or delete the construct.

  • Dataflow

    A Dataflow identifies the type of data that is to be reported or disseminated from a process perspective. It references a Data Structure Definition which specifies the dimensionality, code lists etc., and can be linked to Data Providers that report or publish this data, usually according to a known time schedule (often known as a “release schedule”).

  • Category

    A Category is an item at any level within a scheme that is used to categorise other objects, based on characteristics which the objects have in common. An example is a subject-matter domain scheme where the categories are used to categorise statistics in divisions such as labour, education, tourism, finance.

  • Provision

    slide content 3

  • Subscription

    slide content 3

  • Constraint

    A Constraint specifies actual content or valid content of a data or reference metadata source. It constrains the content implied by the Data Structure Definition or Metadata Structure Definition in terms of the actual code values (e.g. Dimension values) present or expected, or the actual or valid set of keys that are present or expected. 

  • Web Services

    A Web Service is a mechanism whereby a web-based server can have a request sent to it by a computer program and that program can await an asynchronous response. It is the way that computer systems can be distributed amongst many servers through the internet, and how a computer system can provide one or more clearly defined services for use externally by other systems. Typically these services communicate over the standard web HTTP protocol, using XML, REST and SOAP interface protocols.

  • XML, XML Schema and SDMX-ML

    XML is a general language that supports the construction of other languages using the same general syntax. XML Schema is an XML language that allows for the rules and lexicon of other XML languages to be defined in a machine-verifiable way. SDMX-ML is an XML language, which is defined by SDMX Schemas, allowing for structural metadata, data and metadata sets, and web service syntax to be defined according to the SDMX standard.

SDMX-ML is an XML language that is defined by XML Schemas to allow SDMX definitions, queries and responses to be constructed and exchanged as files or data streams.

Each version of SDMX (1.0, 2.0, 2.1) has its own set of XML Schemas, which specify the syntax and semantics of these files and can be used to validate such files.

These standard formats enable computer applications to intercommunicate and share information in a consistent way. However, it is important to understand that the SDMX-ML and the SDMX-EDI are technical formats for exchanging information between systems: this format is not appropriate for the actual processing of the data. Whilst the SDMX-ML schemas for version 2.1 take advantage of the inheritance constructs in the XML Schema language, and therefore mirror the inheritance in the SDMX Information Model, processing the content of the XML is better served by objects and interfaces that can take advantage of the real semantics of the SDMX Information Model.

The SDMX Information Model is pivotal to an SDMX system. The constructs in the SDMX-ML schemas are derived from the SDMX Information Model. Applications that are built using the SDMX Information Model can easily consume  SDMX-ML  instances (such as a Data Structure Defintion) and process this in a way that takes advantage of the semantic of the Information Model. Such objects can be built so that all versions of SDMX can be supported in the one implementation of the Information Model. This makes it very easy to process on input an SDMX-ML construct in one version of SDMX-ML  (e.g. v2.0) and write output another version (e.g. v2.1).

A standard version-independent implementation model  can be used as the basis of all sorts of derivative SDMX-enabled computer systems, allowing rapid development without each system either having to re-engineer the same solution or having to handle each version of SDMX-ML separately.

The Information Model supports many use cases that form a part of the statistical lifecycle. Click here for a quick walk through the Information Model.

Powered by WishList Member - Membership Site Software