Doing ATHENA analysis on the Liverpool Farm

Introduction

This is a collection of information on how to use the Liverpool farm for ATHENA analysis, such as the LiverpoolAnalysis framework. Most of the things I learned from Carl and Mike originally. I simply write things down here, which seem to work well. If you find incorrect information, please notify me, change the page, ....

File Storage

While it is possible, to read (AOD) files from hepstore (i.e. /hepstore/store2/...), this is not recommended for large data sets.

Instead, the best is to store the data on the local mass storage system DPM, /dpm/ph.liv.ac.uk/home/atlas/atlasliverpooldisk/. This space is accessible, after you have setup your grid environment, e.g. on hepgrid1 using
source /batchsoft/atlas/grid/setup.sh
Currently there's 20TB of storage with <2TB free (check using command dpm-getspacemd). Concurrent access by many hosts from the farm should scale well for this storage area.

You may want to read the computing pages on this subject as well! https://hep.ph.liv.ac.uk/twiki/bin/view/Computing/GridStorageGuide

One drawback is, that "standard" unix file handling commands ( ls, rm, cp) will not work. Instead you have to use commands starting with rf or dpns-, i.e.
dpns-ls -l /dpm/ph.liv.ac.uk/home/atlas/atlasliverpooldisk/
rfdir /dpm/ph.liv.ac.uk/home/atlas/atlasliverpooldisk/
Note that these commands do not accept wildcards, i.e. "*", you'd have to use small scripts.

Downloading more Data

As far as I understand, this space is managed by all users - everyone can put new files there or delete (old) files. So we all have to use the commands responsibly (and NOT by mistake delete somebody else's files for example!). If you want to download new files, there are special options for the dq2-get command, which you can use from hepgrid1 ( other hosts disfavored for large downloads!):
export dpmbase='/dpm/ph.liv.ac.uk/home/atlas/atlasliverpooldisk/'
dq2-get -k ATLASLIVERPOOLDISK -S srm://hepgrid11.ph.liv.ac.uk:8446/srm/managerv2?SFN=$dpmbase -p lcg mc08.105802.JF17_pythia_jet_filter.recon.AOD.e347_s462_r563/

Using the container identifier like mc08.105802.JF17_pythia_jet_filter.recon.AOD.e347_s462_r563/ will try download ALL task IDs (tid) of the data set. To load a specific one, use e.g. mc08.105802.JF17_pythia_jet_filter.recon.AOD.e347_s462_r563_tid027563. Note also, that you should not try to download very large data sets in one go. A few 10GB/day should be fine. I've done ~2000 files in one session, which should not be an everyday action, but seems to work ok. Often parts of the downloads fail, files end up with a "__DQ2-xxxx" extension or zero file size. One can use scripts to delete or re-download these.

You should also keep in mind, that when trying to load additional files of a certain set, the dq2-get command may transfer already available files again, thus stressing the grid without need.

Important: No not try to move around files on the dpm with e.g. the rfcp command, as this does not set the ATLASLIVERPOOLDISK token properly. It's possible (but not very convenient) to use lcg-cp in this case:
lcg-cp -v --vo atlas -b -D srmv2 -S ATLASLIVERPOOLDISK  srm://hepgrid11.ph.liv.ac.uk:8446/srm/managerv2?SFN=$dpmbase/olddir/filename srm://hepgrid11.ph.liv.ac.uk:8446/srm/managerv2?SFN=$dpmbase/newdir/newfilename

Using the files in your analysis

The simplest way to load the files from your job options, use a syntax like
svcMgr.EventSelector.InputCollections = [
    "rfio:/dpm/ph.liv.ac.uk/home/atlas/atlasliverpooldisk/mc08.106020.PythiaWenu_1Lepton.recon.AOD.e352_s462_r541/AOD.028292._04021.pool.root.1",
    "rfio:/dpm/ph.liv.ac.uk/home/atlas/atlasliverpooldisk/mc08.106020.PythiaWenu_1Lepton.recon.AOD.e352_s462_r541/AOD.028292._04022.pool.root.1",
    ...
]

The function getFileList from LiverpoolAnalysis and the script below handle this (to some extend) already automatically.

Note, that using the rfio protocol directly may lead to very poor performance in times, when the network load is high! See instructions below how to improve using FileStager.

Running Jobs on the Farm

Jobs have to be submitted from the machine hepcluster, so logon there. There you can use the standard batch commands qsub (submit), qstat (check job status), qdel (delete jobs). See also for the computing pages on this subject

To create job scripts to run over a large DPM data sample with automatic job splitting, you can use the following scripts, which I got from Carl and modified. You will have to adapt things (minimally):
  • Batch2.py Basic script (modified for use of FileStager and new topJobOptions, see below, original version Batch.py)
  • submit.sh An example on how to use the above Batch.py

You will need to setup your grid certificate for DPM access (see above or submit.sh).

A random collection of things to observe and know:
  • While the file list is simply appended to you job options file and you do not need to do anything special with your standard file, some options (output file name, number of events) are set by Python variables ( MyOutput, MyEvents in the above case). Check my (new) top option file (older version), which are derived from LiverpoolAnalysis Z example.
  • The above scripts creates new directories for storing the submission scripts and the output. These are deleted without warning, if you rerun, so take care, if you need the old files.
  • If you have large output files, they may not fit into your home area. One solution is to use the /scratch disk space for temporary storage. Be sure to move your important files later to a different place! My original solution was, to store them on /hepstore disks, for which I used scp to copy the files to a computer with write access (farm machines have only read access). This works well, if you have setup passwordless ssh login (see e.g. here or use google)
  • The above script is setup to use the medium queue, which allows 24h (CPU) time and has ~50 nodes/job slots. There is also a short queue with less nodes and max. 1h (CPU) time.

Using FileStager to run Jobs on the Farm

While the "direct" rfio access to the files will work, there may be a serious performance drop in situations, when the DPM system is loaded heavily. One way to improve things considerably then, is to use FileStager. This will automatically download the files to be analysed in the background, the job can access it directly from the local disk, and eventually the file will be removed. The gain in speed can be up to a factor of 10 or so. Thanks again to John and Carl for helping me to get this working. Below you'll find preliminary instructions.

The FileStager documentation can be found here, but you probably will not need those.

First, you should update the FileStager version to the latest version (as of now this is FileStager-00-00-34). After setting up ATHENA and in your directory do:
cmt co -r FileStager-00-00-34 Database/FileStager
cd Database/FileStager/cmt
cmt config
source setup.sh
gmake

Then, you need to configure things in your topJobOptions. I've updated the LiverpoolAnalysis example options LivZAnalysis/share/LivZBosonExample_topOptions.py by an option UseFileStager. You'll also need the additional configuration routine LivTools/python/LivTools_FileStagerConfig.py. Note that I also modified the logics how the input files are defined. This will also work with my Batch2.py mentioned above, use option -p. The new files are all in CVS.

-- JanKretzschmar - 05 Jun 2009 -- JanKretzschmar - 02 Feb 2009
Topic attachments
I Attachment Action Size Date Who CommentSorted ascending
Batch.py.txttxt Batch.py.txt manage 5 K 02 Feb 2009 - 15:06 JanKretzschmar  
Batch2.py.txttxt Batch2.py.txt manage 6 K 05 Jun 2009 - 15:57 JanKretzschmar  
LivTools_FileStagerConfig.py.txttxt LivTools_FileStagerConfig.py.txt manage 2 K 05 Jun 2009 - 15:53 JanKretzschmar  
LivZBosonExample_topOptions.py.txttxt LivZBosonExample_topOptions.py.txt manage 17 K 02 Feb 2009 - 15:08 JanKretzschmar  
LivZBosonExample_topOptions2.py.txttxt LivZBosonExample_topOptions2.py.txt manage 14 K 05 Jun 2009 - 15:58 JanKretzschmar  
submit.shsh submit.sh manage 1 K 02 Feb 2009 - 15:07 JanKretzschmar  
Topic revision: r7 - 21 Oct 2010, JanKretzschmar
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback