Services

Data and Metadata Structure Design

 

Data structures

The SDMX standards allow for the definition of the logical structure of the data in the data sources, and the mapping of these structures to the logical format required by the data collector or web service. These logical structures are called data structure definitions. There is one data structure definition for each different structure of data, which usually means one structure for a specific domain or "type" of data, such as demographic, national accounts, tourism, electricity consumption, sales etc. The notion of data structure definitions is at the centre of SDMX data standards and has been in existence and used within the statistics and central banking community for 15 years (SDMX took the concept and created an XML schema to describe a data structure). A data structure definition describes the logical structure of a data set or database. Schemas are derived from this logical structure and these are used for data gathering, data dissemination, and for integration with existing systems, databases, and mapping to other standards such as DDI and XBRL. With a data structure definition it is very easy to exchange a data set, query a database, register the existence of data in a registry, query a registry - all these are based on the data structure definition. It doesn't matter if the data are economic, social, demographic, health, financial, sales, spending. All these different types of data can be described in a common way using a common meta-structure. This means that common tools can be developed to process queries, to transform and merge data sets, to register data.

The design of good data structures is analogous to the design of good relational databases: good data structure design ensures that each component of the multi-dimensional key of the data does just one job and identifies just one type of object in the multi-dimensional structure. This in turn leads to the development and use of code lists which codify just one semantic - this is called a "concept" in SDMX terms. The use of the data structure technique leads to flexible and robust database design, and enables generic and re-usable software to be developed. For instance, a data structure definition expressed in XML (actually in SDMX-ML) can be used to drive data warehouse loading, retrieval, and transformations without the need to know the type or precise structure of the data.

A Data Structure comprises three basic components:

  1. The Dimensions forming the Data Key. Each Dimension is a Concept whose value in a dataset is defined by the Representation defined for the Concept. In a data set that uses the Data Structure the value of the Dimension, when combined with all of the other Dimension values identifies uniquely (the key of) a set of observations. It is usual, but not obligatory, to define "Time" as one of the Dimensions, as most observations are measured over time.
  2. The Measures , each of which is a Concept whose valid values are defined by the Representation. In a data set using the Data Structure there is an observation value for each of the Measures for each key in the data set.
  3. The Attributes , each of which is a Concept whose valid values are defined by the Representation. The attribute value in a data set gives additional information (or metadata) about the data set, or one of its observations, or one of its keys, such as the measurement unit of an observation.

If an application has access to the Data Structure then all of the semantic of the data set is available. SDMX defines a single XML format (and, for EDIFACT users, a single EDI format) for the definition of the Data Structure. SDMX also defines a single EDI format for the contents of a data set and a variety of XML formats to support the variety of requirements of the data set. For a given Data Structure each of these data set formats is predictable and can be generated automatically from the Data Structure. Furthermore, as the semantic of the data in each format is the same, one format can be transformed to another in a simple process. This concept of a set of data format standards driven from a meta-structure (the Data Structure) is one of the great strengths of SDMX and gives it the ability to describe and support any type of data.

Metadata structures

SDMX also uses the concept of a meta-structure for metadata. Metadata is "data about data" and therefore can be data about anything. SDMX supports the corollary to the Data Structure - the Metadata Structure. Instead of the Data Key that defines the components that identify Measures, the Metadata Structure has a Metadata Key that defines the components that identify the type of data (object type) to which the metadata in a metadata set applies. Each metadata value in a metadata set is "indexed" by the unique logical identity of the object to which it relates. SDMX has a powerful metamodel that enables all objects to be identified by the logical identifiers of the model classes in a way that is analogous to the way keys are defined in a relational database.

This may sound complex, but in fact it is very simple and relies on a good analysis of the model underpinning the data to which the metadata is to be attached in just the same way as a Data Structure relies on a good analysis of the scope of the data and how it is identified.

Top of page

 

© 2005-06 metadata technology Ltd. | All rights reserved                                          design by dee-gee.co.uk