Grid Storage using HEP's Storage Element
A guide to using grid tools to store and access files on a DPM grid Storage Element (SE). Members of LHC experiments should consult their own grid instructions about where and how to store their grid files.
Liverpool HEP Configuration
Liverpool HEP runs a Tier2 LCG computer cluster. Part of this is a high capacity, high performance Storage Element (SE) on which experiments can store raw data and processed output. Anyone with a grid certificate and a member of an approved VO (which includes most
GridPP -approved experiments) can access and store files on this SE.
Liverpool HEP use the Disk Pool Manager (DPM) system to provide access to the storage. Some utilities are generic to all SE types, but DPM has a number of DPM-specific utilities. You should be able to access all of these tools from any HEP Linux system, which have Grid UI tools installed by default.
All DPM files at Liverpool use a root path of
/dpm/ph.liv.ac.uk/home/
. VOs then have a home directory under this, and then user data is stored under that eg
-
/dpm/ph.liv.ac.uk/home/lhcb/lhcb-file.txt
Authentication
In order to access the SE you must have a valid grid proxy, with a VOMS extension. The VOMS extension will identify which VO and group you are a member of, and any special roles. DPM needs this in order to assign you the correct access privileges. eg
-
voms-proxy-init -voms atlas
to authenticate as a plain ATLAS member. If you are using the proxy for batch jobs you might want to create one with a longer lifetime than default (12 hrs) eg for 48 hours
- voms-proxy-init --valid 48:00
If you do not have a grid certificate then you must apply for one. There are more instructions on the
UK eScience website. You should then
register with the Virtual Organisation (VO) for your experiment.
File Operations
Many basic file listing and modification operations are performed using the dpns- utilities, which are often SE-specific equivalents of standard file utilities.
Linux util |
Grid util |
Comments |
ls |
gfal-ls |
user/group names may be arbitrary |
rm |
gfal-rm |
Only removes the filename not the file itself, see rfrm |
ln |
dpns-ln |
|
chmod |
gfal-chmod |
Only accepts octal values |
chown |
dpns-chown |
Usually only admins can use this |
chgrp |
dpns-chgrp |
Usually only admins can use this |
mkdir |
gfal-mkdir |
|
mv |
dpns-rename |
Only renames, doesn't move a file |
getfacl |
dpns-getacl |
Syntax is same as for Linux ACLs |
setfacl |
dpns-setacl |
Syntax is same as for Linux ACLs |
du |
dpns-du |
Outputs size of all child directories as well, quite slow |
Permissions to perform file operations follow the usual unix-style user/group/other format.
Examples
List a directory
- gfal-ls root://hepgrid11.ph.liv.ac.uk//dpm/ph.liv.ac.uk/home/dteam/
List a file with extra detail
- gfal-ls -l gsiftp://hepgrid11.ph.liv.ac.uk//dpm/ph.liv.ac.uk/home/dteam/bigfile.txt
Remove a file
- gfal-rm root://hepgrid11.ph.liv.ac.uk//dpm/ph.liv.ac.uk/home/dteam/bigfile2.txt
Protocols and URLs
NB The SRM protocol (ie URLs beginning srm://) is now deprecated and scripts using this protocol should be updated to use a different one, most usually XROOTD.
There are a number of ways of referencing a particular directory or file on an SE. Files can be read and written using a number of protocols. For example to access a file with a path of
- /dpm/ph.liv.ac.uk/home/dteam/file1.txt
using the XROOTD protocol would use a URL of (note the extra '/' after the hostname for this protocol)
-
root://hepgrid11.ph.liv.ac.uk//dpm/ph.liv.ac.uk/home/dteam/file1.txt
Using
GridFTP would use a URL of
-
gsiftp://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/dteam/file1.txt
Webdav access (mostly for web browsers but can also be used with curl) would use a URL of
Which protocol you use depends on how the file is being accessed, but for HEP applications XROOTD is generally the most efficient.
The gfal-* grid UI tools will work with any valid protocol.
Copying Files to or from an SE
All SE types can have a numebr of file operations performed on them using the GFAL file utils gfal
-*. The main one is gfal
-copy.
gfal-copy copies a file from one location to another. The location can be a file on a local filesystem or on a remote SE, for example (using
GridFTP)
-
gfal-copy file:/tmp/file.txt gsiftp://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/lhcb/file.txt
To copy a file given an LFN (logical file name) you can use the command:
Note the LHCb code refers to the file as LFN:/lhcb/MC/2010/DST/00006481/0000/00006481_00000001_1.dst you must replace the LFN: with lfn:/grid to match the pattern lcg-cp is expecting.The file name for local batch jobs is then:
Quota Tokens
Some experiments (notably ATLAS) divide their grid storage into reserved portions, called quota tokens (these used to be called Space Tokens). These are analogous to user quotas on a normal filesystem. Many of these tokens exist at all sites that support the experiment. Liverpool HEP has some specific tokens eg for ATLAS, ATLASLIVERPOOLDISK, which reserves some local grid storage for local Liverpool ATLAS researchers. Please consult the local experiment framework coordinators about the correct use of the tokens.
Quota tokens apply to the directory files are written to and do not have to be specified on the command line. eg the following command will write a file to the ATLASLIVERPOOLDISK token automatically
-
gfal-copy /tmp/file.txt root://hepgrid11.ph.liv.ac.uk//dpm/ph.liv.ac.uk/home/atlas/atlasliverpooldisk/file.txt
You can query the available space for a token with dpm-getspacemd, eg
[user@hepgrid1 ~]$ dpm-getspacemd --token_desc ATLASLIVERPOOLDISK
16b0346c-7977-49ba-91fa-6b41832abf07 ATLASLIVERPOOLDISK atlasPool
atlas/uk
20.00T 10.14T Inf REPLICA ONLINE
Just using
dpm-getspacemd
without any arguments will list all tokens.
Direct File Access
Most file operations so far have involved copying files to or from the SE and local disk. SEs also have the capability to provide direct file access, where the file is located on the SE and the data is read over the network.
XROOTD
The most efficient access provided by DPM (and most other Grid storage systems) is called XROOTD. This gives an interface that programs can access but doesn't provide a normal file system interface like NFS. This means that any programs will need to have XROOTD support added. Luckily some programs have this support already, notably ROOT where a standard file path can be replaced with an XROOTD URL, for example
-
TFile::Open("root://hepgrid11.ph.liv.ac.uk//dpm/ph.liv.ac.uk/home/atlas/atlasliverpooldisk/file.root")
This access requires a valid grid proxy.
WEBDAV
While XROOTD provides the fastest access for ROOT files it does require a valid grid proxy and is only really supported by ROOT. Anonymous read-only access via WEBDAV is available on site at Liverpool. Full authenticated access with a grid proxy, like XROOTD, can be accessed from anywhere. WEBDAV can be accessed directly from some applications eg ROOT, or from standard web browsers. This uses a URL of the form
-
http://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/atlas/
eg in ROOT access a file with
-
TFile::Open("http://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/atlas/atlasliverpooldisk/file.root")
For full authenticated access replace
http
with
https
. Your client application or browser will need a valid grid certificate or proxy for this access.
Deleting Files
Files which have been registered in a file catalogue can be deleted using gfal-rm, for example
-
gfal-rm root://hepgrid11.ph.liv.ac.uk//dpm/ph.liv.ac.uk/home/atlas/file.txt
This will work on all SE technologies local and remote. The local DPM system can also have files removed by using the (faster)
rfrm
command, for example
-
rfrm /dpm/ph.liv.ac.uk/home/atlas/file.txt
Note that rf* commands are deprecated and may not be available in the future.
There is a third command
dpns-rm
which only removes the file metadata entry, not the file itself. Generally users should not need to use this command, ask the local grid admins for help if files cannot be deleted properly.
Further Reading