Metadata extraction: from FITS files to databases with MEx
Note: this hands-on session runs only in the morning; in the afternoon you will deploy
DALToolKit to create data access services using your database.
Abstract
Publishing images and spectra in a VO-compliant format is a two step procedure: first making sure all data is valid and described in a common, homogeneous way, and then providing query interfaces to the data, using the validated data descriptions.
MEx is a tool to aid the first step (
metadata ingestion) where the data descriptions are extracted from FITS headers into a data repository. A set of required and optional metadata are defined for each type of data, and are extracted from the FITS files through mapping rules. The data repository (most commonly a database) separates the metadata ingestion from the second step, the query service.
The goal of this session is to load a simple MySQL database and deploy
DALToolKit for providing SIA and SSA interfaces to the data.
Participants should bring their own data to ingest, data of which they have a good understanding (of the meaning of the FITS keywords). When a custom database structure should be supported, a small amount of programming might be required.
External References
Advisors (ESO)
- Remco Slijkhuis
- Bruno Rino
- Jean-Christophe Malapert
Software Requirements
Download
http://www.euro-vo.org/dcaworkshop2008/HandsOn/mex_daltoolkit/
Bug report and feature requests
http://mex-eso.blogspot.com/
Session outline
Publishing data in the VO as a two-step procedure:
- gathering metadata (knowledge of the data, e.g. FITS keywords) in a homogeneous fashion: MEx (ESO)
- building a (web) service that searches data using the metadata and allows access to the data: DALToolKit (ESA)
The session is organized in two parts:
- a first one using ESO data
- a second one using your own data.
1st run: demo data, demo database
- (one time only) setup a database
- gathering meta-data into the database
- map FITS keywords to "concepts" AKA model items: Mapping Editor
- get a sample FITS header
- define mappings for (at least) the required metadata
- test mappings against sample FITS file header
- ingest metadata into database: MEx
- execute mappings against all the FITS files
- build service
- MEx creates default DALToolKit configuration
2nd run: your data, your database
- your data:
- the Mapping Editor, revisited
- several types of data in one package: Catalogue Builder
- your data definition
- model items configuration
- your own mapedit
- your database:
- MEx scripting
- adapt DALToolKit configuration
Steps for using Simple Image Access Protocol
Setup a database
make sure the database is running and connect yourself to the database server as "superuser"
mysql -u root -p
Now, create a database:
create database esodata;
And finally, create a SIA table in the esodata database:
use esodata;
source samples/db/sia.sql
Check the table structure:
show tables;
desc SIA;
get a FITS header
Extract a FITS header using the following command:
java -jar lib/fitshead-1.0.jar -x 0 samples/images/GOODS_ISAAC_03_H_V1.5.fits > sampleheader.txt
define/test mappings
- Open your favorite browser and go to the mapping editor interface:
http://vops1.hq.eso.org:8080/mapedit
- upload the model item list specific for this workshop: config/modelitem_definitions.txt
- select data type: image.reduced
- upload your fits header using upload (text)
- click "Validate" to use the default rules
- edit mapping rules that generate errors or warnings
- fix errors
- Note: min and max RA and Dec are mandatory for DALToolKit
- when all are valid, download mappings file
Ingest
set-up a directory for sharing in tomcat: unzip samples/files.zip into $CATALINA_BASE/webapps
Ingest data in the database:
- edit mex configuration config/mex-daltookit.properties (e.g. db password, folder to copy data to)
- run mex on files + mappings file
java -jar mex-java.jar --type SIA -d samples/images -m samples/mappings/isaac.txt
build service
- edit DALToolKit configuration if needed
- deploy DALToolKit service
cd DALToolkit
vim build.properties.local
ant deploy
SIAP-v1.0-mex
Tips for using Simple Spectral Access Protocol v0.1
- Database creation script:
samples/db/ssa-0.1.sql
- Example FITS header extraction:
java -jar lib/fitshead-1.0.jar -x 0 samples/spectra/esa/3C273_0136550101.fits > sampleheader.txt
- Ingestion using provided mappings:
java -jar mex-java.jar --type SSA-0.1 -d samples/spectra/esa -m samples/mappings/esa-0.1.txt
- DALToolKit service config:
SSAP-v0.1-mex
Tips for using Simple Spectral Access Protocol v1.0
- Database creation script:
samples/db/ssa.sql
- Example FITS header extraction:
java -jar lib/fitshead-1.0.jar -x 0 samples/spectra/fors2/GOODS_FORS2_GDS_J033202.99-274301.2_904509_V2.0.fits > sampleheader.txt
- Ingestion using provided mappings:
java -jar mex-java.jar --type SSA-1.0 -d samples/spectra/fors2 -m samples/mappings/fors2.txt
- DALToolKit service config:
SSAP-v1.0-mex