Metadata extraction: from FITS files to databases with MEx
Note: this hands-on session runs only in the morning; in the afternoon you will deploy
DALToolKit to create data access services using your database.
Abstract
Publishing images and spectra in a VO-compliant format is a two step procedure: first making sure all data is valid and described in a common, homogeneous way, and then providing query interfaces to the data, using the validated data descriptions.
MEx is a tool to aid the first step (
metadata ingestion) where the data descriptions are extracted from FITS headers into a data repository. A set of required and optional metadata are defined for each type of data, and are extracted from the FITS files through mapping rules. The data repository (most commonly a database) separates the metadata ingestion from the second step, the query service.
The goal of this session is to load a simple MySQL database and deploy
DALToolKit for providing SIA and SSA interfaces to the data.
Participants should bring their own data to ingest, data of which they have a good understanding (of the meaning of the FITS keywords). When a custom database structure should be supported, a small amount of programming might be required.
External References
Advisors (ESO)
- Remco Slijkhuis
- Bruno Rino
- Jean-Christophe Malapert
Software Requirements
Session outline
- Publishing data in the VO as a two-step procedure:
- gathering metadata (knowledge of the data, e.g. FITS keywords) in a homogeneous fashion
- building a (web) service that searches data using the metadata and aloows access to the data
- gathering meta-data: MEx (ESO)
- building the service: DALToolKit (ESA)
1st run: demo data, demo database
- (one time only) setup a database
- gathering meta-data into the database
- map FITS keywords to "concepts" AKA model items: Mapping Editor
- get a sample FITS header
- define mappings for (at least) the required metadata
- test mappings against sample FITS file header
- ingest metadata into database: MEx
- execute mappings against all the FITS files
- build service
- MEx creates default DALToolKit configuration
2nd run: your data, your database
- your data:
- the Mapping Editor, revisited
- several types of data in one package: Catalogue Builder
- your data definition
- model items configuration
- your own mapedit
- your database:
- MEx scripting
- adapt DALToolKit configuration
Steps
- setup a database
- make sure database is running
- connect as "superuser"
mysql -u root -p
create database esodata;
use esodata
source samples/db/sia.sql
show tables;
desc SIA;
java -jar lib/fitshead-1.0.jar -x 0 samples/images/GOODS_ISAAC_03_H_V1.5.fits > sampleheader.txt
- define/test mappings
- mapping editor: http://vops1.hq.eso.org:8080/mapedit
- upload the model item list specific for this workshop: config/modelitem_definitions.txt
- select data type: image.reduced
- upload fits header
- edit mapping rules
- fix errors
- Note: min and max RA and Dec are mandatory for DALToolKit
- when all are valid, download mappings file
- set-up a directory for sharing in tomcat
- unzip samples/files.zip into $CATALINA_BASE/webapps
- ingest
- edit mex configuration config/mex-daltookit.properties (e.g. db password, folder to copy data to)
- run mex on files + mappings file
java -jar mex-java.jar --type SIA -d samples/images -m samples/mappings/isaac.txt
- build service
- edit DALToolKit configuration if needed
- deploy DALToolKit service
cd DALToolKit
vim build.properties.local
ant deploy
[SIAP-v1.0-mex|SSAP-v0.1-mex|SSAP-v1.0-mex]