Doing ATHENA analysis on the Liverpool Farm
Introduction
This is a collection of information on how to use the Liverpool farm for ATHENA analysis, such as the
LiverpoolAnalysis framework. Most of the things I learned from Carl and Mike originally. I simply write things down here, which seem to work well. If you find incorrect information, please notify me, change the page, ....
File Storage
While it is possible, to read (AOD) files from hepstore (i.e. /hepstore/store2/...), this is not recommended for large data sets.
Instead, the best is to store the data on the local mass storage system DPM,
/dpm/ph.liv.ac.uk/home/atlas/atlasliverpooldisk/
. This space is accessible, after you have setup your grid environment, e.g. on hepgrid1 using
source /batchsoft/atlas/grid/setup.sh
Currently there's 20TB of storage with <2TB free (check using command
dpm-getspacemd
). Concurrent access by many hosts from the farm should scale well for this storage area.
You may want to read the computing pages on this subject as well! https://hep.ph.liv.ac.uk/twiki/bin/view/Computing/GridStorageGuide
One drawback is, that "standard" unix file handling commands (
ls
,
rm
,
cp
) will not work. Instead you have to use commands starting with
rf
or
dpns-
, i.e.
dpns-ls -l /dpm/ph.liv.ac.uk/home/atlas/atlasliverpooldisk/
rfdir /dpm/ph.liv.ac.uk/home/atlas/atlasliverpooldisk/
Note that these commands do not accept wildcards, i.e. "*", you'd have to use small scripts.
Downloading more Data
As far as I understand, this space is managed by all users - everyone can put new files there or delete (old) files. So we all have to use the commands responsibly (and NOT by mistake delete somebody else's files for example!). If you want to download new files, there are special options for the
dq2-get
command, which you can use from
hepgrid1
(
other hosts disfavored for large downloads!):
export dpmbase='/dpm/ph.liv.ac.uk/home/atlas/atlasliverpooldisk/'
dq2-get -k ATLASLIVERPOOLDISK -S srm://hepgrid11.ph.liv.ac.uk:8446/srm/managerv2?SFN=$dpmbase -p lcg mc08.105802.JF17_pythia_jet_filter.recon.AOD.e347_s462_r563/
Using the container identifier like
mc08.105802.JF17_pythia_jet_filter.recon.AOD.e347_s462_r563/
will try download
ALL task IDs (tid) of the data set. To load a specific one, use e.g. mc08.105802.JF17_pythia_jet_filter.recon.AOD.e347_s462_r563_tid027563. Note also, that you should not try to download very large data sets in one go. A few 10GB/day should be fine. I've done ~2000 files in one session, which should not be an everyday action, but seems to work ok. Often parts of the downloads fail, files end up with a "__DQ2-xxxx" extension or zero file size. One can use scripts to delete or re-download these.
You should also keep in mind, that when trying to load additional files of a certain set, the
dq2-get
command may transfer already available files again, thus stressing the grid without need.
Important: No not try to move around files on the dpm with e.g. the
rfcp
command, as this does not set the
ATLASLIVERPOOLDISK
token properly. It's possible (but not very convenient) to use
lcg-cp
in this case:
lcg-cp -v --vo atlas -b -D srmv2 -S ATLASLIVERPOOLDISK srm://hepgrid11.ph.liv.ac.uk:8446/srm/managerv2?SFN=$dpmbase/olddir/filename srm://hepgrid11.ph.liv.ac.uk:8446/srm/managerv2?SFN=$dpmbase/newdir/newfilename
Using the files in your analysis
The simplest way to load the files from your job options, use a syntax like
svcMgr.EventSelector.InputCollections = [
"rfio:/dpm/ph.liv.ac.uk/home/atlas/atlasliverpooldisk/mc08.106020.PythiaWenu_1Lepton.recon.AOD.e352_s462_r541/AOD.028292._04021.pool.root.1",
"rfio:/dpm/ph.liv.ac.uk/home/atlas/atlasliverpooldisk/mc08.106020.PythiaWenu_1Lepton.recon.AOD.e352_s462_r541/AOD.028292._04022.pool.root.1",
...
]
The function
getFileList
from LiverpoolAnalysis and the script below handle this (to some extend) already automatically.
Note, that using the rfio
protocol directly may lead to very poor performance in times, when the network load is high! See instructions below how to improve using FileStager
.
Running Jobs on the Farm
Jobs have to be submitted from the machine
hepcluster
, so logon there. There you can use the standard batch commands
qsub
(submit),
qstat
(check job status),
qdel
(delete jobs). See also for the
computing pages on this subject
To create job scripts to run over a large DPM data sample with automatic job splitting, you can use the following scripts, which I got from Carl and modified.
You will have to adapt things (minimally):
- Batch2.py Basic script (modified for use of
FileStager
and new topJobOptions, see below, original version Batch.py)
- submit.sh An example on how to use the above Batch.py
You will need to setup your grid certificate for DPM access (see above or
submit.sh
).
A random collection of things to observe and know:
- While the file list is simply appended to you job options file and you do not need to do anything special with your standard file, some options (output file name, number of events) are set by Python variables (
MyOutput
, MyEvents
in the above case). Check my (new) top option file (older version), which are derived from LiverpoolAnalysis Z example.
- The above scripts creates new directories for storing the submission scripts and the output. These are deleted without warning, if you rerun, so take care, if you need the old files.
- If you have large output files, they may not fit into your home area. One solution is to use the
/scratch
disk space for temporary storage. Be sure to move your important files later to a different place! My original solution was, to store them on /hepstore
disks, for which I used scp
to copy the files to a computer with write access (farm machines have only read access). This works well, if you have setup passwordless ssh login (see e.g. here or use google)
- The above script is setup to use the
medium
queue, which allows 24h (CPU) time and has ~50 nodes/job slots. There is also a short
queue with less nodes and max. 1h (CPU) time.
Using FileStager to run Jobs on the Farm
While the "direct"
rfio
access to the files will work, there may be a serious performance drop in situations, when the DPM system is loaded heavily. One way to improve things considerably then, is to use
FileStager
. This will automatically download the files to be analysed in the background, the job can access it directly from the local disk, and eventually the file will be removed. The gain in speed can be up to a factor of 10 or so. Thanks again to John and Carl for helping me to get this working. Below you'll find preliminary instructions.
The
FileStager
documentation can be found
here, but you probably will not need those.
First, you should update the
FileStager
version to the latest version (as of now this is
FileStager-00-00-34
). After setting up ATHENA and in your directory do:
cmt co -r FileStager-00-00-34 Database/FileStager
cd Database/FileStager/cmt
cmt config
source setup.sh
gmake
Then, you need to configure things in your topJobOptions. I've updated the
LiverpoolAnalysis example options
LivZAnalysis/share/LivZBosonExample_topOptions.py by an option
UseFileStager
. You'll also need the additional configuration routine
LivTools/python/LivTools_FileStagerConfig.py. Note that I also modified the logics how the input files are defined. This will also work with my
Batch2.py mentioned above, use option
-p
. The new files are all in CVS.
--
JanKretzschmar - 05 Jun 2009 --
JanKretzschmar - 02 Feb 2009