CDS Metadata session
Sebastien Derriere & Thomas Boch
Abstract
Data can only be properly interpreted and used when proper metadata
are associated. Image pixel values without FITS header (with astrometric
metadata, instrument information, epoch...), catalogue values without
parameters description (column types, units, ...) are useless.
An important step when publishing data to the VO is to ensure that
relevant metadata are provided, allowing wide usage of the corresponding
data. Several metadata standards have been and are being developed in
the IVOA context to ensure the use of homogeneous metadata across the
VO, and allow good interoperability.
This session will demonstrate how to assign standardized metadata
prior to publishing data to he VO: Unified Content Descriptors (UCDs)
to tables, units.
We will also demonstrate a few use cases, showing how these metadata
can be used by existing VO tools to perform advanced actions.
Tutorial steps
- Assigning UCDs to a dataset
- Assigning units to a dataset
- Using metadata in VO tools
Software requirements
(note that the tutorial is proposed in PERL, but could also be performed in Python or other scripting/programing languages)
Tutorial guide
Introduction
Goals of this session:
- show where metadata are used in the VO
- what format/standard standardize VO metadata
- show practical methods to add them to existing data
- use these metadata in VO tools
The various exercises of this session are independent, and can be
addressed in any order. Some very simplistic test data are provided,
but you are encouraged to try and test applying the demonstrated
paradigms to your own datasets.
The general problem of publishing data to the VO : you have some
dataset, with its original description. You need to identify what has
to be done to publish it to the VO:
- find the relevant data access protocol (ConeSearch, SIAP, SSAP, SNAP...)
- identify which metadata will be needed in the VO format
- convert original description to VO standards
- convert the original data to the VO exchange format (e.g. database to VOTable)
- translation layer
- use existing libraries/tools
- advertise your service by publishing it in a VO registry
- fill-in VOResource metadata
Assigning UCDs to a dataset
Introduction
UCDs (Unified Content Descriptors) provide the semantic meaning of
quantities (what the quantity is?).
They are mainly used for describing the contents of columns in
VOTable documents, with a ucd="" attribute in the FIELD element.
But they can also be used to describe individual parameters, or
tabular data in the registry.
UCDs are standardized and described in two reference documents (IVOA recommendations): one for
the syntax rules
and the other
for the list of valid words
.
Briefly put, a UCD consists of at least one word, or several separated by semicolons (
;
).
The first word carries most of the meaning. To describe a magnitude measured in the
V band, we can use the word
phot.mag
(describing a magnitude), and combine it with the word
em.opt.V
(describing the V band in the optical): the complete UCD will be
phot.mag;em.opt.V
UCD-related documentation and tools can be found online
http://cdsweb.u-strasbg.fr/UCD/
.
A set of on-line tools is also available:
http://cdsweb.u-strasbg.fr/UCD/tools.htx
The first step for data providers is to identify the relevant UCDs describing the data
they want to publish to the VO.
We will use as a test dataset a catalogue of planetary nebulae in M33 (Ciardullo et al., 2004).
You can download a
CSV file of the data, and a file containing the
description of the 11 selected columns. The goal is to find the
relevant UCDs to describe these columns.
We will open the
CSV file with
TopCat
: File, Load Table,
format=CSV.
Then use the button

"Display column metadata", and make sure to check "UCD" in the Display menu.
Simply double-click and write in the metadata field you want to edit for our table.
To find the relevant UCDs, you can use the
Manual search or the
Automatic search as explained below.
Manual search
Try to find some UCDs, using the UCD builder (
http://cdsweb.u-strasbg.fr/UCD/cgi-bin/descr2ucd
).
Simply enter the parameter description, and the tool will suggest a UCD.
You can copy and paste the UCDs in the column metadata.
Automatic search
For large collections, it is desirable to automate the process of finding UCDs corresponding to
descriptions. We can use the "assign" method of the UCD SOAP Web Services
(
http://cdsweb.u-strasbg.fr/cdsws/ucdClient.gml
). We will pass each column description to
the service, and it will give back the corresponding (best guess) UCD.
This SOAP Web Service, as other available methods for UCD manipulation, can be
consumed in a number of ways (PERL, Python, Java). We propose a simple PERL example
for our problem. Example in
Java is also available
.
Edit the PERL script
assign.pl (save and change extension to
.pl
) to find UCDs corresponding
to the descriptions in the file
apj_614.desc.
- You can use hints from http://cdsweb.u-strasbg.fr/cdsws/tucdClient2.gml
.
- You need to make three changes where
CHANGE_ME
is written in the source
- Provide the proper path to the file with the description of the columns
- Give the path to the WSDL
- Invoke the assign method of the service
Run your script to see the result, and copy/paste the UCDs in TopCat. Solution is available
here.
Result
Once you have assigned the UCDs, TopCat allows you to save the table in VOTable format (XML file). This VOTable will
contain the data and the metadata (UCDs). We provide a
solution VOTable.
Note that some toolkits (such as SAADA) will assist you in the process of assigning UCDs to a dataset.
Assigning units to a dataset
We will use as a test dataset a catalogue of planetary nebulae in M33 (Ciardullo et al., 2004).
You can download a
CSV file of the data, and a file containing the
description of the 11 selected columns. The goal is to find the
relevant units to describe these columns.
In fact, most columns of this catalogue don't have units. We just know that:
- Right ascension and declination are in decimal degrees
- The OIII magnitude is in magnitudes
- The H{alpha}+[NII] flux is in erg per cm2 per second. We have the log of the flux, use [ ] around the symbols to represent the log
- Velocities are in km/s
We will open the
CSV file with
TopCat
: File, Load Table,
format=CSV. Then use the button "Display column metadata", and make sure to check "Units"
in the Display menu.
Use the on-line resources to find the proper expression for the needed units. The
unit
attribute
of the column metadata must contain a string of symbols, e.g.
W.m-2.sr-1
:
Once you have assigned the units, you can save the table as VOTable (XML file). The VOTable will
contain the data and the metadata (units). We provide a
solution VOTable.
Using metadata in VO tools
These simple exercices will demonstrate how various metadata are used in VO tools.
UCDs
We will show two possible usage of the UCDs:
- Automated detection of columns
- Use in Aladin filters
Launch Aladin
, and
load an Image of M33 (File, Open, Aladin Images, choose Lw-POSSI.E for example).
Then query SIMBAD (or some VizieR survey), and the VOTable where you assigned the UCDs as a local
file (or use the
solution VOTable).
Now select "Cross-match objects" from the Catalog menu: you can notice that the relevant columns for the
coordinates are automatically selected, even if they have different names.
This is because the UCDs indicate unambiguously the nature of the columns.
Select "Create a filter" from the Catalog menu, and select "Draw circles proportional to object luminosity".
Switching to "Advanced mode" reveals that the filter use regular expressions on UCDs to indicate which
column has to be used to interpret the expression. One generic filter can then operate on many different
data sources, if UCDs are present.
Units
We will show the use of units, again using Aladin filters:
Launch Aladin
, and
load an Image of M33 (File, Open, Aladin Images, choose Lw-POSSI.E for example).
Then load the VOTable where you assigned the UCDs as a local
file (or use the
solution VOTable).
Select "Create a filter" from the Catalog menu, and switch to advanced mode. Copy and paste
the following before applying (or load
this as a local file):
$[spect.dopplerVeloc*]<-1.7e5m/s {draw blue square}
{draw red rhomb}
The filters use a
conversion library
that is
able to interpret units, and here perform on the fly the conversion from m/s to km/s.