Assigning metadata to your datasets (UCDs, units, utypes, characterization)

Abstract

Data can only be properly interpreted and used when proper metadata are associated. Image pixel values without FITS header (with astrometric metadata, instrument information, epoch...), catalogue values without parameters description (column types, units, ...) are useless.

An important step when publishing data to the VO is to ensure that relevant metadata are provided, allowing wide usage of the corresponding data. Several metadata standards have been and are being developed in the IVOA context to ensure the use of homogeneous metadata across the VO, and allow good interoperability.

This session will demonstrate how to assign standardized metadata prior to publishing data to he VO: Unified Content Descriptors (UCDs) to tables, units.

The IVOA has developped a few datamodels (DM) for interoperability: STC (for space time wavelength coordinates metadata), Spectrum DM (for spectra or time series), Characterisation (for Observation descriptions in data parameter space), Line DM (for accurate description of Atomic and Molecular Lines) and Theory data model (for description of outputs of simulations). After recalling the content of some of these datamodels, we will show how data producers can gather metadata information and organize them consistently with the IVOA DM and how they can publish them.

We will also demonstrate a few use cases, showing how these metadata can be used by existing VO tools to perform advanced actions, and how client developpers and end users can make use of the DMs, using various tools and various formats (XML, VOTable with utypes, FITS).

External References

Advisors (CDS)

  • Sébastien Derrière
  • Thomas Boch
  • François Bonnarel

Software Requirements

Additionally to the common workshop software requirements, this session requires:
  • Perl 5.x
    • Linux: should be preinstalled
    • Windows
    • MacOSX: preinstalled

Download

http://www.euro-vo.org/dcaworkshop2008/HandsOn/metadata/AC1.tar


Hands-on session

Goals of this session:
  • show where metadata are used in the VO
  • what format/standard standardize VO metadata
  • show practical methods to add them to existing data
  • use these metadata in VO tools

The various exercises of this session are independent, and can be addressed in any order. Some very simplistic test data are provided, but you are encouraged to try and test applying the demonstrated paradigms to your own datasets.

The general problem of publishing data to the VO : you have some dataset, with its original description. You need to identify what has to be done to publish it to the VO:

  • find the relevant data access protocol (ConeSearch, SIAP, SSAP, SNAP...)
  • identify which metadata will be needed in the VO format
    • convert original description to VO standards
  • convert the original data to the VO exchange format (e.g. database to VOTable)
    • translation layer
    • use existing libraries/tools
  • advertise your service by publishing it in a VO registry
    • fill-in VOResource metadata

Assigning metadata

Assigning UCDs to a dataset

UCDs (Unified Content Descriptors) provide the semantic meaning of quantities (what the quantity is?). They are mainly used for describing the contents of columns in VOTable documents, with a ucd="" attribute in the FIELD element. But they can also be used to describe individual parameters, or tabular data in the registry.

UCDs are standardized and described in two reference documents (IVOA recommendations): one for the syntax rules and the other for the list of valid words.

Briefly put, a UCD consists of at least one word, or several separated by semicolons (;). The first word carries most of the meaning. To describe a magnitude measured in the V band, we can use the word phot.mag (describing a magnitude), and combine it with the word em.opt.V (describing the V band in the optical): the complete UCD will be phot.mag;em.opt.V

UCD-related documentation and tools can be found online http://cdsweb.u-strasbg.fr/UCD/. A set of on-line tools is also available: http://vizier.u-strasbg.fr/UCD/tools.htx

The first step for data providers is to identify the relevant UCDs describing the data they want to publish to the VO.

We will use as a test dataset a catalogue of planetary nebulae in M33 (Ciardullo et al., 2004). You can download a CSV file of the data, and a file containing the description of the 11 selected columns. The goal is to find the relevant UCDs to describe these columns.

We will open the CSV file with TopCat: File, Load Table, format=CSV. Then use the button "Display column metadata", and make sure to check "UCD" in the Display menu.

Once you have assigned metadata (either manually or automatically), you can save your work in a VOTable, TopCat will do the conversion.

Manual search

Try to find some UCDs, using the UCD builder ( http://cdsweb.u-strasbg.fr/UCD/cgi-bin/descr2ucd ).

You can copy and paste the UCDs in the column metadata.

Automatic search

For large collections, it is desirable to automate the process of finding UCDs corresponding to descriptions. We can use the "assign" method of the UCD SOAP Web Services ( http://cdsweb.u-strasbg.fr/cdsws/ucdClient.gml ). We will pass each column description to the service, and it will give back the corresponding (best guess) UCD.

This SOAP Web Service, as other available methods for UCD manipulation, can be consumed in a number of ways (PERL, Python, Java). We propose a simple PERL example for our problem. Example in Java is also available.

Edit the PERL script assign.pl (save and change extension to .pl) to find UCDs corresponding to the descriptions in the file apj_614.desc.

  • You can use hints from http://cdsweb.u-strasbg.fr/cdsws/tucdClient2.gml.
  • You need to make three changes where CHANGE_ME is written in the source
    • Provide the proper path to the file with the description of the columns
    • Give the path to the WSDL
    • Invoke the assign method of the service
Run your script to see the result, and copy/paste the UCDs in TopCat. Solution is available here.

Once you have assigned the UCDs, you can save the table as VOTable (XML file). The VOTable will contain the data and the metadata (UCDs). We provide a solution VOTable.

Note that some toolkits will assist you in the process of assigning UCDs to a dataset.

Finding proper units

We will use as a test dataset a catalogue of planetary nebulae in M33 (Ciardullo et al., 2004). You can download a CSV file of the data, and a file containing the description of the 11 selected columns. The goal is to find the relevant units to describe these columns.

In fact, most columns of this catalogue don't have units. We just know that:

  • Right ascension and declination are in decimal degrees
  • The OIII magnitude is in magnitudes
  • The H{alpha}+[NII] flux is in erg per cm2 per second. We have the log of the flux, use [ ] around the symbols to represent the log
  • Velocities are in km/s

We will open the CSV file with TopCat: File, Load Table, format=CSV. Then use the button "Display column metadata", and make sure to check "Units" in the Display menu.

Use the on-line resources to find the proper expression for the needed units. The unit attribute of the column metadata must contain a string of symbols, e.g. W.m-2.sr-1:

Once you have assigned the units, you can save the table as VOTable (XML file). The VOTable will contain the data and the metadata (units). We provide a solution VOTable.

Metadata in the Registry

The final step in publishing a service or dataset to the VO is to register it in a VO Registry. This means providing some metadata for Curation, Coverage, etc... The metadata elements are defined by the various schemas: http://www.ivoa.net/xml/

In practice, registries provide simplified forms to avoid you to write pure XML. You fill in some elements, and the corresponding XML is stored in the registry. You can explore some of the resources in : http://vops1.hq.eso.org:8080/registry/browse.jsp See that there are human-readable versions and the corresponding XML.

When you register a resource, you must provide an AuthorityId. For this workshop, this metadata element is ivo://org.euro-vo and points to a specific resource.

Characterization and utypes

This exercise will be introduced at the beginning of the afternoon session by F. Bonnarel. Slides in pdf

Launch CAMEA (JNLP)

How metadata are used

These simple exercices will demonstrate how various metadata are used in VO tools.

UCDs

We will show two possible usage of the UCDs:

  • Automated detection of columns
  • Use in Aladin filters

Launch Aladin, and load an Image of M33 (File, Open, Aladin Images, choose Lw-POSSI.E for example). Then query SIMBAD (or some VizieR survey), and the VOTable where you assigned the UCDs as a local file (or use the solution VOTable). Now select "Cross-match objects" from the Catalog menu: you can notice that the relevant columns for the coordinates are automatically selected, even if they have different names. This is because the UCDs indicate unambiguously the nature of the columns.

Select "Create a filter" from the Catalog menu, and select "Draw circles proportional to object luminosity". Switching to "Advanced mode" reveals that the filter use regular expressions on UCDs to indicate which column has to be used to interpret the expression. One generic filter can then operate on many different data sources, if UCDs are present.

Units

We will show the use of units, again using Aladin filters: Launch Aladin, and load an Image of M33 (File, Open, Aladin Images, choose Lw-POSSI.E for example). Then load the VOTable where you assigned the UCDs as a local file (or use the solution VOTable).

Select "Create a filter" from the Catalog menu, and switch to advanced mode. Copy and paste the following before applying (or load this as a local file):

$[spect.dopplerVeloc*]<-1.7e5m/s {draw blue square}

{draw red rhomb}

The filters use a conversion library that is able to interpret units, and here perform on the fly the conversion from m/s to km/s.

Utypes

Utypes are used in the description of footprints. These can then be provided by SIA servers. Launch Aladin, and load one image of M33. Then go to the All VO tab in the launch panel.

Open the detailed list, and unselect all. Then simply check the image resource #53 (SIAP Service HST preview images) and submit. For each group of result, you can preview the coverage by hovering the mouse on the metadata tree.

Topic attachments
I Attachment History Action Size Date Who Comment
Pdfpdf CharMetadataAndUtypes.pdf r2 r1 manage 1546.4 K 24 Jun 2008 - 17:44 FrancoisBonnarel  
Txttxt SCRACStoCharxml.pl.txt r2 r1 manage 14.5 K 24 Jun 2008 - 16:44 FrancoisBonnarel  
Csvcsv apj_614.csv r1 manage 11.0 K 23 Jun 2008 - 16:34 SebastienDerriere  
Descdesc apj_614.desc r2 r1 manage 0.3 K 24 Jun 2008 - 07:42 SebastienDerriere  
Xmlxml apj_614.xml r2 r1 manage 27.6 K 24 Jun 2008 - 07:42 SebastienDerriere  
Txttxt assign.pl.txt r1 manage 1.0 K 24 Jun 2008 - 07:44 SebastienDerriere  
Txttxt assign_solved.pl.txt r1 manage 1.0 K 24 Jun 2008 - 07:44 SebastienDerriere  
Ajsajs veloc_filter.ajs r1 manage 0.1 K 24 Jun 2008 - 07:43 SebastienDerriere  
Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r11 - 24 Jun 2008 - FrancoisBonnarel
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback