CDS Metadata session

Sebastien Derriere & Thomas Boch

Abstract

Data can only be properly interpreted and used when proper metadata are associated. Image pixel values without FITS header (with astrometric metadata, instrument information, epoch...), catalogue values without parameters description (column types, units, ...) are useless.

An important step when publishing data to the VO is to ensure that relevant metadata are provided, allowing wide usage of the corresponding data. Several metadata standards have been and are being developed in the IVOA context to ensure the use of homogeneous metadata across the VO, and allow good interoperability.

This session will demonstrate how to assign standardized metadata prior to publishing data to he VO: Unified Content Descriptors (UCDs) to tables, units.

We will also demonstrate a few use cases, showing how these metadata can be used by existing VO tools to perform advanced actions.

Tutorial steps

  • Assigning UCDs to a dataset
  • Assigning units to a dataset
  • Using metadata in VO tools

Software requirements

(note that the tutorial is proposed in PERL, but could also be performed in Python or other scripting/programing languages)

Tutorial guide

Introduction

Goals of this session:
  • show where metadata are used in the VO
  • what format/standard standardize VO metadata
  • show practical methods to add them to existing data
  • use these metadata in VO tools

The various exercises of this session are independent, and can be addressed in any order. Some very simplistic test data are provided, but you are encouraged to try and test applying the demonstrated paradigms to your own datasets.

The general problem of publishing data to the VO : you have some dataset, with its original description. You need to identify what has to be done to publish it to the VO:

  • find the relevant data access protocol (ConeSearch, SIAP, SSAP, SNAP...)
  • identify which metadata will be needed in the VO format
    • convert original description to VO standards
  • convert the original data to the VO exchange format (e.g. database to VOTable)
    • translation layer
    • use existing libraries/tools
  • advertise your service by publishing it in a VO registry
    • fill-in VOResource metadata

Assigning UCDs to a dataset

Introduction

UCDs (Unified Content Descriptors) provide the semantic meaning of quantities (what the quantity is?). They are mainly used for describing the contents of columns in VOTable documents, with a ucd="" attribute in the FIELD element. But they can also be used to describe individual parameters, or tabular data in the registry.

UCDs are standardized and described in two reference documents (IVOA recommendations): one for the syntax rules and the other for the list of valid words.

Briefly put, a UCD consists of at least one word, or several separated by semicolons (;). The first word carries most of the meaning. To describe a magnitude measured in the V band, we can use the word phot.mag (describing a magnitude), and combine it with the word em.opt.V (describing the V band in the optical): the complete UCD will be phot.mag;em.opt.V

UCD-related documentation and tools can be found online http://cdsweb.u-strasbg.fr/UCD/. A set of on-line tools is also available: http://cdsweb.u-strasbg.fr/UCD/tools.htx

The first step for data providers is to identify the relevant UCDs describing the data they want to publish to the VO.

We will use as a test dataset a catalogue of planetary nebulae in M33 (Ciardullo et al., 2004). You can download a CSV file of the data, and a file containing the description of the 11 selected columns. The goal is to find the relevant UCDs to describe these columns.

We will open the CSV file with TopCat: File, Load Table, format=CSV.

topcat_loadCSV.png

Then use the button topcat_columnMETA.png "Display column metadata", and make sure to check "UCD" in the Display menu.

topcat_ucd.png

Simply double-click and write in the metadata field you want to edit for our table. To find the relevant UCDs, you can use the Manual search or the Automatic search as explained below.

Manual search

Try to find some UCDs, using the UCD builder ( http://cdsweb.u-strasbg.fr/UCD/cgi-bin/descr2ucd ).

Simply enter the parameter description, and the tool will suggest a UCD.

You can copy and paste the UCDs in the column metadata.

Automatic search

For large collections, it is desirable to automate the process of finding UCDs corresponding to descriptions. We can use the "assign" method of the UCD SOAP Web Services ( http://cdsweb.u-strasbg.fr/cdsws/ucdClient.gml ). We will pass each column description to the service, and it will give back the corresponding (best guess) UCD.

This SOAP Web Service, as other available methods for UCD manipulation, can be consumed in a number of ways (PERL, Python, Java). We propose a simple PERL example for our problem. Example in Java is also available.

Edit the PERL script assign.pl (save and change extension to .pl) to find UCDs corresponding to the descriptions in the file apj_614.desc.

  • You can use hints from http://cdsweb.u-strasbg.fr/cdsws/tucdClient2.gml.
  • You need to make three changes where CHANGE_ME is written in the source
    • Provide the proper path to the file with the description of the columns
    • Give the path to the WSDL
    • Invoke the assign method of the service
Run your script to see the result, and copy/paste the UCDs in TopCat. Solution is available here.

Result

Once you have assigned the UCDs, TopCat allows you to save the table in VOTable format (XML file). This VOTable will contain the data and the metadata (UCDs). We provide a solution VOTable.

Note that some toolkits (such as SAADA) will assist you in the process of assigning UCDs to a dataset.

Assigning units to a dataset

We will use as a test dataset a catalogue of planetary nebulae in M33 (Ciardullo et al., 2004). You can download a CSV file of the data, and a file containing the description of the 11 selected columns. The goal is to find the relevant units to describe these columns.

In fact, most columns of this catalogue don't have units. We just know that:

  • Right ascension and declination are in decimal degrees
  • The OIII magnitude is in magnitudes
  • The H{alpha}+[NII] flux is in erg per cm2 per second. We have the log of the flux, use [ ] around the symbols to represent the log
  • Velocities are in km/s

We will open the CSV file with TopCat: File, Load Table, format=CSV. Then use the button "Display column metadata", and make sure to check "Units" in the Display menu.

Use the on-line resources to find the proper expression for the needed units. The unit attribute of the column metadata must contain a string of symbols, e.g. W.m-2.sr-1:

Once you have assigned the units, you can save the table as VOTable (XML file). The VOTable will contain the data and the metadata (units). We provide a solution VOTable.

Using metadata in VO tools

These simple exercices will demonstrate how various metadata are used in VO tools.

UCDs

We will show two possible usage of the UCDs:

  • Automated detection of columns
  • Use in Aladin filters

Launch Aladin, and load an Image of M33 (File, Open, Aladin Images, choose Lw-POSSI.E for example). Then query SIMBAD (or some VizieR survey), and the VOTable where you assigned the UCDs as a local file (or use the solution VOTable). Now select "Cross-match objects" from the Catalog menu: you can notice that the relevant columns for the coordinates are automatically selected, even if they have different names. This is because the UCDs indicate unambiguously the nature of the columns.

Select "Create a filter" from the Catalog menu, and select "Draw circles proportional to object luminosity". Switching to "Advanced mode" reveals that the filter use regular expressions on UCDs to indicate which column has to be used to interpret the expression. One generic filter can then operate on many different data sources, if UCDs are present.

Units

We will show the use of units, again using Aladin filters: Launch Aladin, and load an Image of M33 (File, Open, Aladin Images, choose Lw-POSSI.E for example). Then load the VOTable where you assigned the UCDs as a local file (or use the solution VOTable).

Select "Create a filter" from the Catalog menu, and switch to advanced mode. Copy and paste the following before applying (or load this as a local file):

$[spect.dopplerVeloc*]<-1.7e5m/s {draw blue square}

{draw red rhomb}

The filters use a conversion library that is able to interpret units, and here perform on the fly the conversion from m/s to km/s.

Topic attachments
I Attachment History Action Size Date Who Comment
Pngpng topcat_columnMETA.png r1 manage 0.4 K 22 Jun 2009 - 08:59 SebastienDerriere  
Pngpng topcat_loadCSV.png r1 manage 8.4 K 22 Jun 2009 - 08:59 SebastienDerriere  
Pngpng topcat_ucd.png r1 manage 14.4 K 22 Jun 2009 - 09:01 SebastienDerriere  
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r6 - 22 Jun 2009 - SebastienDerriere
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback