Managing Metadata using AMGA
Introduction
AMGA
is the Metadata Catalogue of the EGEE project and is part of its gLite Middleware. It aims at providing a metadata service for Grid infrastructures, that is general enough to be used by as many applications as possible, taking into consideration the requirements from the very diverse EGEE communities.
The main features of the AMGA catalogue are high performance, especially on WAN connections, tight integration into the Virtual Organisation (VO) management system of the EGEE Grid, which works with X509 Grid certificates and VOMS, fine-grained access control and advanced replication features. An SQL-like query language providing most of the features of modern SQL dialects provides complex joins and includes string and mathematical functions. This query language serves to hide the differences of various vendor dialects of SQL and is translated into the correct dialect of the back-end.
In this document we will discuss how to use the service.
The AMGA clients
It is possible to run the AMGA client from any gLite user interface having the AMGA client installed.
AMGA has 2 clients programs:
-
mdclient
: a terminal client for the AMGA metadata server.
-
mdcli
: client for the AMGA metadata server. It sends commands to an AMGA server and prints out the result
To install the clients it it necessary to add or update your repository settings in /etc/yum.repos.d/
Here it follows the list of the repository repo files you need to add to /etc/yum.repos.d/ to install properly AMGA:
The you can install the client:
[root@localhost ~]# yum install glite-amga-cli
Before starting the AMGA client application it is necessary to copy a configuration file into the home directory. There is a template file provided by the AMGA client installation that can be customised and used to access your own AMGA server.
[morgan@localhost ~]$ cp $GLITE_LOCATION/etc/mdclient.config $HOME/.mdclient.config
Then it is necessary to open the file using a text editor and change some values. The following fields should be changed accordingly:
Login=NULL |
(no username will be used by the AMGA client) |
UseSSL = require |
(The client will use a secure connection) |
Port=8822 |
(AMGA server listening port) |
HOST=amga.si.inaf.it |
(AMGA server hostname) |
AuthenticateWithCertificate = 1 |
(The Amga client will get your username from the certificate) |
UseGridProxy = 1 |
(The client will authenticate looking for a local grid proxy) |
Now the AMGA client application can be started:
[morgan@localhost ~]$ mdclient
Connecting to amga.si.inaf.it:8822...
ARDA Metadata Server 1.3.0
Query>
Follows a brief description of generic use commands:
>> createdir [options]
Make a new folder. It can inherit the schema associated to the upper level folder
>> rm pattern
Remove items corresponding to the given pattern
>> link
Make a link to another file or to a external URL
>> dir
List the content of a directory
>> listentries
List the items (not the collections) of a directory
>> stat
Show the statistic information about a directory
>> chown
Changhe the ownership of a file or a directory
>> chmod
Change the access rights to a file or a directory
>> rmdir
Remove a directory
>> dump
Make a recursive dump startung from a given directory, (the default is: '=/=')
AMGA usage
Following a filesystems schema, AMGA allows to associate directories to DB tables, while entries inside the directories are associated to files. This section describes a set of commands used to manipulate directories. Once a directory has been created, it is possible to associate a schema defining several
attributes in it. In analogy with databases it is possible to think about directories as table names and their attributes as column names. Each attribute is defined by the couple: (attribute name, attribute datatype).
Creating a Directory
Following the filesystem metaphore, metadata can be viewed exactly as if the user were browsing in a filesystem. The
dir
command lists all the entries belonging to the given directory specified in the
path
. The
dir
command will show all
entries and/or
sub-directories. This allows the users to define complex metadata hierarchies.
Once the
mdclient
has been started the default directory is '=/=' (root). Nevertheless, in order to override this setting, the user can change the parameter
DefaultDir
in
.mdclien.config
file and define a different default directory.
Query> pwd
>> /planck/
Query> dir
>> /planck/test
>> collection
Query> ls
>> /planck/test
Query> createdir tutorial
Query> cd /planck/tutorial/
Query> cd ..
Query> rmdir /planck/tutorial/
As for a filesystem (or a database) it is possible to set and list the ACL for each directory (or attribute)
Query> acl_show /planck/tutorial/
>> root rw-
>> root:planck rwx
>> system:anyuser rx
>> system:planck rwx
Query> acl_add /planck/tutorial/ inaf rwx
Query> acl_show /planck/tutorial/
>> root rw-
>> root:inaf rwx
>> root:planck rwx
>> system:anyuser rx
>> system:planck rwx
It is also possible to use
chmod
and
chown
to change the acl or the directory owner. To remove the directory:
Query> rmdir /planck/tutorial/
Creating the schema (managing attributes)
Once a directory has been created, it is possible to associate a schema defining several attributes in it. In analogy with databases it is possible to think about directories as table names and their attributes as column names. Each attribute is defined by the couple: (attribute name, attribute datatype).
When adding attributes to a directory, we are going to define a collection for it. Type is the name of an SQL datatype which will be translated (if necessary) into a data type understood by the back end DB. Valid types are
int |
float |
varchar(n) |
text |
timestamp |
numeric(p,s) |
Query> createdir cosmology
Query> addattr /planck/cosmology/ Hubble float
Query> addattr /planck/cosmology/ index float
Query> addattr /planck/cosmology/ omegam float
Query> addattr /planck/cosmology/ omegal float
Query> addattr /planck/cosmology/ name varchar(30)
To list the created attributes
Query> listattr /planck/cosmology/
>> Hubble
>> float
>> index
>> float
>> omegam
>> float
>> omegal
>> float
>> name
>> varchar(30)
Query>
To remove an attribute
Query> removeattr /planck/cosmology/ omegal
Adding Entries
Once the schema has been defined, entries insertion is possible.
Query> addentry /planck/cosmology/001 name LCDM
Query> addentry /planck/cosmology/002 name LCDM Hubble 0.7 omegam 0.3 index 1
Query>
Listing the attributes:
Query> getattr /planck/cosmology/* name
>> /planck/cosmology
>> LCDM
>> 001
>> LCDM
>> 002
>> LCDM
Query> ls /planck/cosmology/
>> /planck/cosmology
>> 001
>> 002
changing entry value:
Query> setattr /planck/cosmology/001 name CDM
Query> getattr /planck/cosmology/* name
>> /planck/cosmology
>> 002
>> LCDM
>> 001
>> CDM
Query>
to remove entry
Query> rm /planck/cosmology/001
Query>
Making queries
One of the most important issue on using metadata is the possibility to find entries just querying for a particular attribute value.
Query> addentry /planck/cosmology/001 name WCDM Hubble 0.7 omegam 0.3 index 1
Query> find /planck/cosmology/ 'Hubble = 0.7'
>> 002
>> 001
Query> find /planck/cosmology/ 'Hubble = 0.7'
you can also use
>
or
<
or you can try more complex queries ed:
'like(...,"%2%")'
.
If you use the
selectattr
command
Query> selectattr .:name .:Hubble 'Hubble > 0.5'
>> LCDM
>> 0.69999999999999996
>> WCDM
>> 0.69999999999999996
Joining collections
As you can see below, it is possible to make complex queries allowing the user to make joins among tables.
Create a schema for simulation type:
Query> addattr /planck/simulation key int
Query> addattr /planck/simulation name varchar(4)
Query> addentry /planck/simulation/001 key 1 name 'LCDM'
Query> addentry /planck/simulation/002 key 2 name 'SCDM'
Query> addentry /planck/simulation/003 key 3 name 'WCDM'
Query> cd /planck/cosmology/
Query> addattr /planck/cosmology/ key int
Query> addattr /planck/cosmology/ sim_key int
Query> addattr /planck/cosmology/ hubble float
Query> addattr /planck/cosmology/ omegal float
Query> addentry /planck/cosmology/001 key 1 sim_key 3 hubble 0.7 omegal 0.7
Query> addentry /planck/cosmology/003 key 2 sim_key 2 hubble 0.5 omegal 0.65
Query> addentry /planck/cosmology/002 key 3 sim_key 2 hubble 0.55 omegal 0.65
to join the two collections:
Query> selectattr /planck/cosmology:hubble /planck/simulation:name '/planck/simulation:key=/planck/cosmology:sim_key'
>> 0.69999999999999996
>> WCDM
>> 0.5
>> SCDM
>> 0.55000000000000004
>> SCDM
Asynchronous access with mdcli
The
mdcli
command line tool allows to directly issue metadata commands on the shell, it's output is intended to be easily parseable by scripting.
[morgan@localhost ~]$ mdcli 'ls /planck/cosmology'
001
003
002
[morgan@localhost ~]$
[morgan@localhost ~]$ mdcli 'selectattr /planck/cosmology:hubble /planck/simulation:name '/planck/simulation:key=/planck/cosmology:sim_key''
0.69999999999999996
WCDM
0.5
SCDM
0.55000000000000004
SCDM
Metadata commands are parsed into pieces which are each separated by white spaces similarly to shell commands. If you want the white space to be part of one piece of the command itself, for example when you want to set an attribute to a string which contains white space, you must enclose it in singe quotes: ' '. You need them every time a part shall contain spaces. Double quotes however are used in queries Metadata to distinguish strings from variable references and common values.
Advanced Usage
Indexing the collections
In case of collections containing a huge amount of entries or collections oftenly joined with other collections it is necessary to define some indexes on one or more attributes.
To create an index the command is:
index_create idxname collection '(attribute)+' [algorithm]'
This command creates a index named
idxname
to the specified
collection
directory on several
attributes
with a given
algorithm
(optional).
The algorithms depend on the backend.
The index is later referred to
collection/idxname
Below some algorithms valid for the
PostgreSQL /mySQL backend.
btree |
Implementation of Lehman-Yao high-concurrency B-trees. |
rtree |
Standard R-trees using Guttman's quadratic split algorithm. |
hash |
Implementation of Litwin's linear hashing. |
To remove the index there exist the command
index_remove collection/idxname
Table constraints
Following commands are able to define several contraints on collection attributes.
Each example on constraints will assume the use of the following schema:
test1/
key int
sim_key int
hubble float
NOT NULL
constraint_add_not_null directory attribute constraint_name
Adds a not NULL constraint for the given attribute of the directory.
constraint_name
is the name used to refer to the name of the constraint. It must be unique for that directory. Write permissions on the directory are necessary for this operation.
UNIQUE
constraint_add_unique directory attribute constraint_name
Adds a UNIQUE constraint for the given attribute of the directory.
constraint_ame
is the name used to refer to the name of the constraint. It must be unique for that directory. Write permissions on the directory are necessary for this operation.
++++REFERENCE
constraint_add_reference directory attribute reffered_attr constraint_name
Adds a foreign key constraint for the given attribute of the directory. The foreign key is given by the referenced attribute which must fully qualify that attribute including the table part, e.g. /dir:attr. constraint_name is the name used to refer to the name of the constraint. It must be unique for that directory. Write permissions on the directory are necessary for this operation.
example
ALTER TABLE dir264 ADD FOREIGN KEY idcontrref REFERENCES (tab1."user:id");
++++CHECK
constraint_add_check directory check constraint_name
Adds a check constraint to the directory. Check constraints are boolean expression which must be true for all entries inserted into the directory. An example would be events > 0 requiring the value assigned to the events attribute to be positive. constraint_name is the name used to refer to the name of the constraint. It must be unique for that directory. Write permissions on the directory are necessary for this operation.
CONSTRAINTS LISTING
constraint_list directory
prints all constraints of a directory. You need read permissions on the concerned directory.
To drop the constraint with the given constraint_name from the directory you can use
constraint_drop directory constraint_name
. Write permissions on the directory are necessary for this operation.
example
The following drop constraints command has been executed just after the creation of a
not null
constraints seen above.
Query> constraint_drop test1/ idconstrnotnull
Query> constraint_list test1/
Query>
Sequences
Amga allow the user to define
sequences. Sequences can be associated to schemas and free the user to keep track of progressive numbers and unique constraint violations.
sequence_create seq_name increment start_value
Creates a sequence named seq_name that starts from start_value and will be incremented by increment
sequence_next dir/seqname
gets the next value from a sequence named
seqname
located in
dir
.
sequence_remove dir/seqname
removes the sequence named
seqname
associated to
dir
.
--
TaffoniGiuliano - 24 Aug 2008