Panda Athena
As an alternative to Ganga, you can use Panda for your distributed analysis. The information here should be enough to get you started, but for full documentation you should consult the
PandaAthena TWiki here:
https://twiki.cern.ch/twiki/bin/view/Atlas/PandaAthena.
Initial set up
Before you use pathena for the first time, you will need to make sure that your nickname is registered with the ATLAS VO. To do this go to
https://lcg-voms.cern.ch:8443/vo/atlas/vomrs, click on the '+' by 'Member Info', then go to 'Edit Personal Info', tick the First name, Last name and nickname boxes and then click Search. If you don't already have a nickname then you can add this in the appropriate field (this should be of the format 'firstnamelastname').
You're now ready to start pathena-ing!
Submitting pathena jobs
Source your favourite ATLAS release. e.g.
source cmthome/setup.sh -tag=15.6.9,32
Source the grid setup (this now includes the pathena setup as default):
source /batchsoft/atlas/grid/setup.sh
Then, all you have to do is, instead of running your athena jobs in the usual way, e.g.
athena myJobOptions.py
, all you have to do is replace 'athena' with 'pathena' and specify input and output dataset names using --inDS and --outDS, like so:
pathena myJobOptions.py --inDS myFavouriteDataset --outDS user10.<nickname>.myTestNtuple.root
If you're running over a large number of files you can specify how many sub-jobs it should be split in to by adding "--split N" to the end of the command (whereN is the number of sub-jobs you want to have).
You can also use the athena '-c' option in the same way as you normally would. e.g.
pathena -c 'Events=100' myJobOptions.py --inDS myFavouriteDataset --outDS user10.<nickname>.myTestNtuple.root
Submitting multiple jobs
Use a simple script which takes a list of datasets and runs the pathena command on each one. An example can be found here:
PathenaSub.py.txt (remove the .txt suffix). The script is also included in OSUtilities/batch. It can also be very easily modified to run over several GRLs instead of datasets (see below).
Monitoring jobs
You can check the status of your jobs by going to the Panda Monitor page (
http://panda.cern.ch:25980/server/pandamon/query) and entering the
PandaID of the job(s) in the 'Job' field.
Once the jobs have completed you will get an email detailing the number of jobs submitted, and how many succeeded, failed or were cancelled.
You can also display the status of your jobs on the command lin. In the terminal, type:
pbook
It will then retreive information about all of the jobs you have submitted (it may take a few seconds to load if you've recently submitted a lot). Then you can display the status of a given job by doing:
>>> show(JobID)
e.g.
Start pBook 0.2.50
>>> show(33)
INFO : Getting status for JobID=33 ...
INFO : Updated JobID=33
======================================
JobID : 33
type : pathena
release : Atlas-15.6.10
cache :
PandaID : 1080485641-1080485664,1080485667-1080485678,1080485680-1080485692,1080485694-1080485695
nJobs : 50 + 1(build)
site : ANALY_LYON_DCACHE
cloud : FR
inDS : mc09_7TeV.109281.J5_pythia_jetjet_1muon.merge.AOD.e534_s765_s767_r1302_r1306/
outDS : user10.katharineleney.J5muon.r1306.root
libDS : user.katharineleney.0614094720.367819.lib._000033
retryID : 0
provenanceID : 0
creationTime : 2010-06-14 09:47:23
lastUpdate : 2010-06-14 16:22:32
params : ../share/Htautau_jobOptions.py --inDS mc09_7TeV.109281.J5_pythia_jetjet_1muon.merge.AOD.e534_s765_s767_r1302_r1306/ --outDS user10.katharineleney.J5muon.r1306.root
jobStatus : running
finished : 49
running : 2
>>>
If you don't specify a
JobID (i.e. simply do 'show()' then the status of all uncompleted jobs will be displayed.
Killing Jobs
Start pbook, as above, and then just do:
>>> kill(JobID)
Resubmit Failed Sub-Jobs
Again, it's dead easy... if a job, or some sub-jobs failed, simply go to pbook and do:
>>> retry(JobID)
It will then pick out any jobs which failed last time and resubmit just those ones for you.
Retreiving the output
Once your jobs have finished you can retrieve the output files by doing dq2-get user10.<nickname>/myTestNtuple.root
And you're done!
* Use the merged AOD files - it complains if you don't.
* You can specify a specific site to use by adding "--site" to your command. e.g. pathena -c 'Events=1000' ../share/Htautau_jobOptions.py --site UKI-NORTHGRID-LIV-HEP_MCDISK --inDS mc09_7TeV.109910.SherpabbAtautaulhMA120TB20.merge.AOD.e534_s765_s767_r1250_r1260/ --outDS user10.katharineleney.myTestNtuple.root
* You can tell pathena to run over your GRL by doing:
pathena myJobOptions.py --goodRunListXML MyLBCollection.xml --outDS user10...
It will then translate your GRL into a list of datasets and use this in place of the --inDS option.
See
https://twiki.cern.ch/twiki/bin/view/Atlas/PandaAthena#example_10_How_to_run_on_a_good for details.
--
KatharineLeney - 04 Jun 2010