Managing Metadata using AMGA

Introduction

AMGA is the Metadata Catalogue of the EGEE project and is part of its gLite Middleware. It aims at providing a metadata service for Grid infrastructures, that is general enough to be used by as many applications as possible, taking into consideration the requirements from the very diverse EGEE communities. The main features of the AMGA catalogue are high performance, especially on WAN connections, tight integration into the Virtual Organisation (VO) management system of the EGEE Grid, which works with X509 Grid certificates and VOMS, fine-grained access control and advanced replication features. An SQL-like query language providing most of the features of modern SQL dialects provides complex joins and includes string and mathematical functions. This query language serves to hide the differences of various vendor dialects of SQL and is translated into the correct dialect of the back-end.

In this document we will discuss how to use the service.

The AMGA clients

It is possible to run the AMGA client from any gLite user interface having the AMGA client installed. AMGA has 2 clients programs:

  • mdclient: a terminal client for the AMGA metadata server.
  • mdcli: client for the AMGA metadata server. It sends commands to an AMGA server and prints out the result

To install the clients it it necessary to add or update your repository settings in /etc/yum.repos.d/

Here it follows the list of the repository repo files you need to add to /etc/yum.repos.d/ to install properly AMGA:

The you can install the client:

[root@localhost ~]# yum install glite-amga-cli

Before starting the AMGA client application it is necessary to copy a configuration file into the home directory. There is a template file provided by the AMGA client installation that can be customised and used to access your own AMGA server.

[morgan@localhost ~]$ cp $GLITE_LOCATION/etc/mdclient.config $HOME/.mdclient.config

Then it is necessary to open the file using a text editor and change some values. The following fields should be changed accordingly:

Login=NULL (no username will be used by the AMGA client)
UseSSL = require (The client will use a secure connection)
Port=8822 (AMGA server listening port)
HOST=amga.si.inaf.it (AMGA server hostname)
AuthenticateWithCertificate = 1 (The Amga client will get your username from the certificate)
UseGridProxy = 1 (The client will authenticate looking for a local grid proxy)

Now the AMGA client application can be started:

[morgan@localhost ~]$ mdclient 
Connecting to amga.si.inaf.it:8822...
ARDA Metadata Server 1.3.0
Query> 

Follows a brief description of generic use commands:
>> createdir [options]
Make a new folder. It can inherit the schema associated to the upper level folder
>> rm pattern
Remove items corresponding to the given pattern
>> link
Make a link to another file or to a external URL
>> dir
List the content of a directory
>> listentries
List the items (not the collections) of a directory
>> stat
Show the statistic information about a directory
>> chown
Changhe the ownership of a file or a directory
>> chmod
Change the access rights to a file or a directory
>> rmdir
Remove a directory
>> dump
Make a recursive dump startung from a given directory, (the default is: '=/=')

AMGA usage

Following a filesystems schema, AMGA allows to associate directories to DB tables, while entries inside the directories are associated to files. This section describes a set of commands used to manipulate directories. Once a directory has been created, it is possible to associate a schema defining several attributes in it. In analogy with databases it is possible to think about directories as table names and their attributes as column names. Each attribute is defined by the couple: (attribute name, attribute datatype).

Creating a Directory

Following the filesystem metaphore, metadata can be viewed exactly as if the user were browsing in a filesystem. The dir command lists all the entries belonging to the given directory specified in the path. The dir command will show all entries and/or sub-directories. This allows the users to define complex metadata hierarchies. Once the mdclient has been started the default directory is '=/=' (root). Nevertheless, in order to override this setting, the user can change the parameter DefaultDir in .mdclien.config file and define a different default directory.

Query> pwd
>> /planck/
Query> dir
>> /planck/test
>> collection
Query> ls
>> /planck/test
Query> createdir tutorial
Query> cd /planck/tutorial/ 
Query> cd ..
Query> rmdir /planck/tutorial/ 

As for a filesystem (or a database) it is possible to set and list the ACL for each directory (or attribute)

Query> acl_show /planck/tutorial/ 
>> root rw-
>> root:planck rwx
>> system:anyuser rx
>> system:planck rwx
Query> acl_add /planck/tutorial/ inaf rwx 
Query> acl_show /planck/tutorial/ 
>> root rw-
>> root:inaf rwx
>> root:planck rwx
>> system:anyuser rx
>> system:planck rwx

It is also possible to use chmod and chown to change the acl or the directory owner. To remove the directory:

Query> rmdir /planck/tutorial/ 

Creating the schema (managing attributes)

Once a directory has been created, it is possible to associate a schema defining several attributes in it. In analogy with databases it is possible to think about directories as table names and their attributes as column names. Each attribute is defined by the couple: (attribute name, attribute datatype).

When adding attributes to a directory, we are going to define a collection for it. Type is the name of an SQL datatype which will be translated (if necessary) into a data type understood by the back end DB. Valid types are

int
float
varchar(n)
text
timestamp
numeric(p,s)

Query> createdir cosmology
Query> addattr /planck/cosmology/ Hubble float
Query> addattr /planck/cosmology/ index float
Query> addattr /planck/cosmology/ omegam float
Query> addattr /planck/cosmology/ omegal float
Query> addattr /planck/cosmology/ name  varchar(30)

To list the created attributes

Query> listattr /planck/cosmology/ 
>> Hubble
>> float
>> index
>> float
>> omegam
>> float
>> omegal
>> float
>> name
>> varchar(30)
Query> 

To remove an attribute

Query> removeattr /planck/cosmology/ omegal

Adding Entries

Once the schema has been defined, entries insertion is possible.

Query> addentry /planck/cosmology/001 name LCDM
Query> addentry /planck/cosmology/002 name LCDM Hubble 0.7 omegam 0.3 index 1 
Query> 

Listing the attributes:

Query> getattr /planck/cosmology/* name
>> /planck/cosmology
>> LCDM
>> 001
>> LCDM
>> 002
>> LCDM
Query> ls /planck/cosmology/
>> /planck/cosmology
>> 001
>> 002

changing entry value:

Query> setattr /planck/cosmology/001 name CDM
Query> getattr /planck/cosmology/* name
>> /planck/cosmology
>> 002
>> LCDM
>> 001
>> CDM
Query> 

to remove entry

Query> rm /planck/cosmology/001
Query> 

Making queries

One of the most important issue on using metadata is the possibility to find entries just querying for a particular attribute value.

Query> addentry /planck/cosmology/001 name WCDM Hubble 0.7 omegam 0.3 index 1
Query> find /planck/cosmology/ 'Hubble = 0.7'
>> 002
>> 001
Query> find /planck/cosmology/ 'Hubble = 0.7'
you can also use > or < or you can try more complex queries ed: 'like(...,"%2%")'.

If you use the selectattr command

Query> selectattr .:name .:Hubble 'Hubble > 0.5'
>> LCDM
>> 0.69999999999999996
>> WCDM
>> 0.69999999999999996

Joining collections

As you can see below, it is possible to make complex queries allowing the user to make joins among tables.

Create a schema for simulation type:

Query> addattr /planck/simulation  key int
Query> addattr /planck/simulation name  varchar(4)
Query> addentry /planck/simulation/001 key 1 name 'LCDM'
Query> addentry /planck/simulation/002 key 2 name 'SCDM'
Query> addentry /planck/simulation/003 key 3 name 'WCDM'

Query> cd /planck/cosmology/ 
Query> addattr /planck/cosmology/ key int 
Query> addattr /planck/cosmology/ sim_key int 
Query> addattr /planck/cosmology/ hubble float 
Query> addattr /planck/cosmology/ omegal float 
Query> addentry /planck/cosmology/001 key 1 sim_key 3 hubble 0.7 omegal 0.7
Query> addentry /planck/cosmology/003 key 2 sim_key 2 hubble 0.5 omegal 0.65
Query> addentry /planck/cosmology/002 key 3 sim_key 2 hubble 0.55 omegal 0.65

to join the two collections:

Query> selectattr /planck/cosmology:hubble /planck/simulation:name '/planck/simulation:key=/planck/cosmology:sim_key'
>> 0.69999999999999996
>> WCDM
>> 0.5
>> SCDM
>> 0.55000000000000004
>> SCDM

Asynchronous access with mdcli

The mdcli command line tool allows to directly issue metadata commands on the shell, it's output is intended to be easily parseable by scripting.

[morgan@localhost ~]$ mdcli 'ls /planck/cosmology'
001
003
002
[morgan@localhost ~]$ 
[morgan@localhost ~]$ mdcli 'selectattr /planck/cosmology:hubble /planck/simulation:name '/planck/simulation:key=/planck/cosmology:sim_key''
0.69999999999999996
WCDM
0.5
SCDM
0.55000000000000004
SCDM

Metadata commands are parsed into pieces which are each separated by white spaces similarly to shell commands. If you want the white space to be part of one piece of the command itself, for example when you want to set an attribute to a string which contains white space, you must enclose it in singe quotes: ' '. You need them every time a part shall contain spaces. Double quotes however are used in queries Metadata to distinguish strings from variable references and common values.

Advanced Usage

Indexing the collections

In case of collections containing a huge amount of entries or collections oftenly joined with other collections it is necessary to define some indexes on one or more attributes.

To create an index the command is: index_create idxname collection '(attribute)+' [algorithm]'

This command creates a index named idxname to the specified collection directory on several attributes with a given algorithm (optional). The algorithms depend on the backend. The index is later referred to collection/idxname

Below some algorithms valid for the PostgreSQL /mySQL backend.

btree Implementation of Lehman-Yao high-concurrency B-trees.
rtree Standard R-trees using Guttman's quadratic split algorithm.
hash Implementation of Litwin's linear hashing.

To remove the index there exist the command index_remove collection/idxname

Table constraints

Following commands are able to define several contraints on collection attributes. Each example on constraints will assume the use of the following schema:

test1/

  key	  int
  sim_key    int
  hubble  float

NOT NULL

constraint_add_not_null directory attribute constraint_name

Adds a not NULL constraint for the given attribute of the directory. constraint_name is the name used to refer to the name of the constraint. It must be unique for that directory. Write permissions on the directory are necessary for this operation.

UNIQUE

constraint_add_unique directory attribute constraint_name

Adds a UNIQUE constraint for the given attribute of the directory. constraint_ame is the name used to refer to the name of the constraint. It must be unique for that directory. Write permissions on the directory are necessary for this operation.


++++REFERENCE

constraint_add_reference directory attribute reffered_attr constraint_name

Adds a foreign key constraint for the given attribute of the directory. The foreign key is given by the referenced attribute which must fully qualify that attribute including the table part, e.g. /dir:attr. constraint_name is the name used to refer to the name of the constraint. It must be unique for that directory. Write permissions on the directory are necessary for this operation.

example

ALTER TABLE dir264 ADD FOREIGN KEY idcontrref REFERENCES (tab1."user:id");


++++CHECK

constraint_add_check directory check constraint_name

Adds a check constraint to the directory. Check constraints are boolean expression which must be true for all entries inserted into the directory. An example would be events > 0 requiring the value assigned to the events attribute to be positive. constraint_name is the name used to refer to the name of the constraint. It must be unique for that directory. Write permissions on the directory are necessary for this operation.

CONSTRAINTS LISTING

constraint_list directory prints all constraints of a directory. You need read permissions on the concerned directory.

To drop the constraint with the given constraint_name from the directory you can use constraint_drop directory constraint_name. Write permissions on the directory are necessary for this operation.

example

The following drop constraints command has been executed just after the creation of a not null constraints seen above.

Query> constraint_drop test1/ idconstrnotnull
Query> constraint_list test1/
Query> 

Sequences

Amga allow the user to define sequences. Sequences can be associated to schemas and free the user to keep track of progressive numbers and unique constraint violations.

sequence_create seq_name increment start_value Creates a sequence named seq_name that starts from start_value and will be incremented by increment

sequence_next dir/seqname gets the next value from a sequence named seqname located in dir.

sequence_remove dir/seqname removes the sequence named seqname associated to dir.

-- TaffoniGiuliano - 24 Aug 2008

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r2 - 24 Aug 2008 - TaffoniGiuliano
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback