The Spectral Converter is a tool that transforms spectra originally stored in a ground-based one-dimensional image-like FITS
format to a fully IVOA SpectrumDM compliant format.
It can be both used on the command line, or over the web when installed as CGI
Key concepts
- template A text file that follows the human-readable format of FITS headers. Its contents will be "patched in" the produced FITS file.
- context A python script that allows to costumise the tool, including specifying a template, and the metadata sources to fill in the template variables.
- datasource Connection to either a database, or to an URL resource returning JSON format.
- datacursor Query to execute against a datasource.
Quick start
How to run the tool
This will run the tool with minimum functionality. It will only convert the data in FITS files using the 1D-image format, to a binary table.
On the command line:
convert.py your_fits_file_here.fits
The parameter, your_fits_file_here.fits
, can either be a local file or an URL. The output FITS file can be specified using --output-file
. By default, it is stored on your system's temporary folder (the exact location printed on the screen).
How to provide your own template
The next level of functionality is to include FITS keywords (metadata) of your choosing to better describe the spectrum. This is achieved with a template: a text file that follows the human-readable format of FITS headers, e.g. (excerpt from an actual template):
APERTURE= '0.86' / [arcsec] Aperture (width or lengthxwidth)
DATE-OBS= '1997-02-07T10:23:47' / UT observation start time
EXPOSURE= 2699.7468 / [s] exposure duration
TSTART = 50486.43319054 / [d] MJD exposure start time
TSTOP = 50486.46591063 / [d] MJD exposure stop time
TMID = 50486.44955059 / [d] MJD exposure mid time
These metedata will be appended to the appropriate FITS extension.
The IVOA Spectrum Data Model
. "Part 4: FITS serialization" details which FITS keywords to use.
For consistency, templates should be stored under the "context" folder, with a "txt" extension.
On the command line:
convert.py -t context/myheader.txt your_fits_file_here.fits
Note that in this template, all FITS keywords' value are constants.
How to provide your own contexts
Templates with variable values are supported through contexts (and its datasources and datacursors) must be in place.
Contexts are python scripts, stored under the "context" folder (e.g the context "archival_images" will consist of context/archival_images.py), that allows to define:
- how to fetch original FITS file
- template to "patch in" the extension of the binary table
- how to fetch metadata (datasources and datacursors)
They are also required to enable CGI.
See (and use) the provided context/default.py as a starting point. This context is appropriately named default: it is used when no "context" parameter is provided. Hence any changes done to this file will affect the usage described in the previous sections!
Below are the customisation options available. It consists of configuration items (properties of the python object defined in the script) or functionality (methods of the python object):
Properties:
- description A short description of the types of files this context applies to. It will be used for display in the help message of the tool
- template Path to the header template to apply
- urlfetch URL to use when fetching the original FITS file. The file identifier given as parameter to the tool will replace a "%s" found in this string
- datasources Dictionary of sources of data (database servers, json endpoints)
- datacursors Dictionary of queries/processing to execute against the datasources
- self_test_identifiers List of identifiers to use when running the tool in self test mode
Methods:
- init() Execute some initialisation code.
- retrieve_file(id) Returns the URL to use when fetching the original FITS file. If the urlfetch property is not flexible enough, it can be set to None, in which case this method will be called instead.
- pre_metatada_fetch(identifier, datacursor_name, datacursor, metadata) This method is run before metadata is fetched from each datacursor. It is an opportunity to change the query before it is run
- post_metadata_fetch(hdulist_in, keys, metadata) This method is run after the metadata is fetched from the datasources, but before applying the values to the template. It is an opportunity to compute values, apply formating, fix know errors in the metadata sources, etc
On the command line:
convert.py -c my_context your_fits_identifier_here
As a CGI:
http://localhost/cgi-bin/spectralconverter/convert.py?context=my_context&id=your_fits_identifier_here
Note that:
- Specifying my_context means the context located at context/my_context.py will be loaded
- Typically a context will define a smarter way to get to a file other than its full URL. Getting it by archival identifier makes more sense.
Reference Information
Command line parameters
convert.py [options] [identifiers]
Identifiers are context dependent. If no context is specified, the
default context is assumed, and idetifiers are file names
The following options are accepted:
-v, --verbose enable debug messages
-h, --help display this message
--output-dir <path> set output folder
-o, --output-file <name> set output file (overrides --output-dir)
--force-overwrite silently overwrites an existing file
-c, --context <token> select context
-t, --template <path> template for new FITS header
-l, --log <path> log messages to a file rather than the console
--self-test execute a set of test conversions
CGI parameters
id an identifier for the file
context a token that identifies which context to apply
Templates
The template syntax is similar to a FITS header, with the following differences:
- FITS cards are separated with a newline
- text matching
${...}
will be replaced with values fetched from the datasources
- text following the rightmost
#
(hash) symbol is considered a template comment and will be discarded.
The placeholders are replaced with values fetched from the datasources as defined in the datacursors property of a context. The syntax for the template placeholders is:
${datacursor_name:metadata_key}
Where:
- datacursor_name name of a datacursor defined in the custumisation, or "fits"
- metadata_key key defined within the datacursor
After processing the template variables and comments, the result should look like a FITS header, where FITS keywords semantics apply, namely:
- keyword names must be either 8 characters long, or use the HIERARCH convention
- string values are enclosed in single quotes
- COMMENT, HISTORY and blank cards are allowed
- the card cannot exceed 80 characters in length
The values of the original FITS file primary header keywords are also available, as a built-in datacursor, using "fits" as the datacursor name.
Examples:
EQUINOX = ${ssa:CoordSys_SpaceFrame_Equinox} / Coordinates precessed to J2000
SPECSYS = '${modelvalues:SpectralCoord.RefPos}'
DATALEN = ${fits:NAXIS1} / Number of points in spectrum
Contexts
datasources
Two types of datasources are available: database and JSON. For database access, a suitable database driver must be installed. Only Sybase is supported at present, but support for other databases can be added easily. Datasources are defined as entries in a dictionary. For databases:
_datasource_name_ : {
'vendor' : _vendor_name_, # 'sybase'
'server' : _server_alias_, # as defined in the Sybase interfaces file
'user' : _database_username_,
'password' : _password_,
'database' : _default_database_,
'isolation' : _isolation_level_, # recommended: '1' (prevents dirty reads)
},
For JSON:
_datasource_name_ : {
'vendor' : 'http-json',
'url' : _URL_, # a %s in the URL will be replaced with the file identifier
},
datacursor
Datacursors define how to fetch data from the datasources: queries to execute against the databases, sections to look up in the case of JSON.
'ssa' : {
'type' : 'db',
'datasource' : 'safdb',
'casesensitive' : False,
'query' : 'select * from ...' # a @identifier in the string will be replaced with the file identifier
},
'modelvalues' : {
'type' : 'db-vertical',
'datasource' : 'safdb',
'casesensitive' : False,
'query' : 'select col1, col2 from ...', # a @identifier in the string will be replaced with the file identifier
},
'ssa' : {
'type' : 'json',
'datasource' : 'eso-fileinfo',
'casesensitive' : False,
'section' : _key_in_JSON_response_,
},
The type of datacursor can be one of:
- db The query will return only one row. Each column will be mapped into a dictionary entry: the column name as key, the cell value as value.
- db-vertical The query will return multiple rows with two columns. Each row will be mapped into a dictionary entry: the first column name as key, the second column as value. To be used when the database is structured as key/value pairs.
- json A JSON dictionary, identified by the section, will be loaded.
The casesensitive value defines whether the keys will be used in a case sensitive manner. Assigning False
allows for case insensitive variable names in the template.
advanced (involves programming)
It is possible to do more than simply copying values from datasources to the header. One can transform, format, override, etc the values. This is achieved by writing python
Installation
Requirements
and optionally, depending on your datasources:
How to install as a CGI under Apache
- edit spectralconverter/.htaccess to match your $PYTHONPATH
- copy the spectralconverter files to the apche cgi-bin folder:
- edit httpd.conf to enable for .htaccess files:
- configure a cache folder for python eggs:
MacOSX
The following details apply if you wish to use the apache installation pre-installed in MacOSX.
- edit spectralconverter/.htaccess to match your $PYTHONPATH
- copy the spectralconverter files to the apche cgi-bin folder:
cp -r spectralconverter /Library/WebServer/CGI-Executables/
- edit httpd.conf to enable for .htaccess files:
sudo vi /etc/apache2/httpd.conf
find: <Directory "/Library/WebServer/CGI-Executables">
on the next line, replace "None" for "All": AllowOverride All
restart Apache (System Preferences -> Sharing -> Web Sharing)
- configure a cache folder for python eggs:
mkdir /Library/WebServer/.python-eggs
chmod 777 /Library/WebServer/.python-eggs
-
- or configure it to an existing folder
sudo vi .htaccess
SetEnv PYTHON_EGG_CACHE /tmp