First steps with the DIRAC metadata functionality
Finding files using metadata
When you're uploading vast amounts of data, it's nice to be able to find it later. Metadata - data about the data - can help with this. DIRAC allows you to assign metadata such as strings, integers, and floating point numbers to files and directories (via their Logical File Names in the DIRAC File Catalog). You can then query the DFC to return a list of the files you want.
For example, once you have sourced your DIRAC environment,
generated a proxy, and started the DFC CLI, you can
find all files associated with the UserGuide
experiment
like so:
FC:/> find / experiment=UserGuide
Query: {'experiment': 'UserGuide'}
/gridpp/userguide/WELCOME.md
QueryTime 0.98 sec
We have assigned the value UserGuide
to the file
WELCOME.md
for the experiment
element or index.
The find
command in the DFC CLI performs the query for us.
FC:/> help find
Find all files satisfying the given metadata information
usage: find [-q] [-D] <path> <meta_name>=<meta_value> [<meta_name>=<meta_value>]
FC:/> exit
In our query above, <path>
was /
(i.e. search the entire catalog from the base directory),
<meta_name>
was experiment
(i.e. a metadata string index indicating to which experiment
the data belongs),
and
<meta_value>
was UserGuide
(OK, so the UserGuide
isn't really an experiment -
at least not in the scientific sense - but you get the idea!).
You can get a list of all of the available commands in the
DFC CLI by using the help command.
To list the instructions for a given command (as above),
type help [command] .
|
There is only one file belonging to the UserGuide
experiment
in the DFC, and it's a pretty harmless MarkDown file.
But you can hopefully see how, particularly when we start
using multiple metadata indices with different types,
DIRAC's metadata functionality is going to be pretty useful.
Assigning metadata to a file
We can also use the DFC CLI to assign metadata to our files. Let's create a file with our favourite text editor and upload it to the grid using the DFC CLI:
$ vim TODO.md
$ cat TODO.md
ToDo
====
* Email Charles re. engine
* Re-do punchcards
* Write to Dad
$ dirac-dms-filecatalog-cli
Starting FileCatalog client
File Catalog Client $Revision: 1.17 $Date:
FC:/> add /gridpp/user/a/ada.lovelace/TODO.md TODO.md UKI-LT2-QMUL2-disk
File /gridpp/user/a/ada.lovelace/TODO.md successfully uploaded to the UKI-LT2-QMUL2-disk SE
We can now set the owner
index for the LFN using the
meta set
command:
FC:/> meta set /gridpp/user/a/ada.lovelace/TODO.md owner ada.lovelace
/gridpp/user/a/ada.lovelace/TODO.md owner ada.lovelace
Again, use help meta to see the syntax for the
meta commands.
|
We can now find the file again using the find
command:
FC:/> find / owner=ada.lovelace
Query: {'owner': 'ada.lovelace'}
/gridpp/user/a/ada.lovelace/TODO.md
QueryTime 0.01 sec
As we've said before, the DFC CLI is useful for small-scale operations on your data. Hopefully, though, you can start to appreciate the power of metadata when it comes to organising your data and performing analyses on it.
The most important thing for the moment, though, is that we can
now put data on the Grid (i.e. on a Storage Element).
This means we can use it in our Grid jobs without needing to
upload with our job as an inputfile
.
We'll now complete making our example workflow
fully Grid-enabled in the next section,
Using Grid-based data in your workflow.