You are here: Foswiki>ATLAS Web>UsingPathena (04 Aug 2010, KatharineLeney)Edit Attach

Panda Athena

As an alternative to Ganga, you can use Panda for your distributed analysis. The information here should be enough to get you started, but for full documentation you should consult the PandaAthena TWiki here: https://twiki.cern.ch/twiki/bin/view/Atlas/PandaAthena.

Initial set up

Before you use pathena for the first time, you will need to make sure that your nickname is registered with the ATLAS VO. To do this go to https://lcg-voms.cern.ch:8443/vo/atlas/vomrs, click on the '+' by 'Member Info', then go to 'Edit Personal Info', tick the First name, Last name and nickname boxes and then click Search. If you don't already have a nickname then you can add this in the appropriate field (this should be of the format 'firstnamelastname').

You're now ready to start pathena-ing!

Submitting pathena jobs

Source your favourite ATLAS release. e.g.
source cmthome/setup.sh -tag=15.6.9,32

Source the grid setup (this now includes the pathena setup as default):
source /batchsoft/atlas/grid/setup.sh 

Then, all you have to do is, instead of running your athena jobs in the usual way, e.g.
athena myJobOptions.py

, all you have to do is replace 'athena' with 'pathena' and specify input and output dataset names using --inDS and --outDS, like so:
pathena myJobOptions.py --inDS myFavouriteDataset --outDS user10.<nickname>.myTestNtuple.root 

If you're running over a large number of files you can specify how many sub-jobs it should be split in to by adding "--split N" to the end of the command (whereN is the number of sub-jobs you want to have).

You can also use the athena '-c' option in the same way as you normally would. e.g.
pathena -c 'Events=100' myJobOptions.py --inDS myFavouriteDataset --outDS user10.<nickname>.myTestNtuple.root 

Submitting multiple jobs

Use a simple script which takes a list of datasets and runs the pathena command on each one. An example can be found here: PathenaSub.py.txt (remove the .txt suffix). The script is also included in OSUtilities/batch. It can also be very easily modified to run over several GRLs instead of datasets (see below).

Monitoring jobs

You can check the status of your jobs by going to the Panda Monitor page (http://panda.cern.ch:25980/server/pandamon/query) and entering the PandaID of the job(s) in the 'Job' field.

Once the jobs have completed you will get an email detailing the number of jobs submitted, and how many succeeded, failed or were cancelled.

You can also display the status of your jobs on the command lin. In the terminal, type:
pbook

It will then retreive information about all of the jobs you have submitted (it may take a few seconds to load if you've recently submitted a lot). Then you can display the status of a given job by doing:
>>> show(JobID)

e.g.

Start pBook 0.2.50
>>> show(33)
INFO : Getting status for JobID=33 ...
INFO : Updated JobID=33
======================================
          JobID : 33
           type : pathena
        release : Atlas-15.6.10
          cache :
        PandaID : 1080485641-1080485664,1080485667-1080485678,1080485680-1080485692,1080485694-1080485695
          nJobs : 50 + 1(build)
           site : ANALY_LYON_DCACHE
          cloud : FR
           inDS : mc09_7TeV.109281.J5_pythia_jetjet_1muon.merge.AOD.e534_s765_s767_r1302_r1306/
          outDS : user10.katharineleney.J5muon.r1306.root
          libDS : user.katharineleney.0614094720.367819.lib._000033
        retryID : 0
   provenanceID : 0
   creationTime : 2010-06-14 09:47:23
     lastUpdate : 2010-06-14 16:22:32
         params : ../share/Htautau_jobOptions.py --inDS mc09_7TeV.109281.J5_pythia_jetjet_1muon.merge.AOD.e534_s765_s767_r1302_r1306/ --outDS user10.katharineleney.J5muon.r1306.root
      jobStatus : running
             finished : 49
              running : 2
>>>

If you don't specify a JobID (i.e. simply do 'show()' then the status of all uncompleted jobs will be displayed.

Killing Jobs

Start pbook, as above, and then just do:
>>> kill(JobID) 

Resubmit Failed Sub-Jobs

Again, it's dead easy... if a job, or some sub-jobs failed, simply go to pbook and do:
>>> retry(JobID)

It will then pick out any jobs which failed last time and resubmit just those ones for you.

Retreiving the output

Once your jobs have finished you can retrieve the output files by doing dq2-get user10.<nickname>/myTestNtuple.root

And you're done!

A few extra notes...

* Use the merged AOD files - it complains if you don't.

* You can specify a specific site to use by adding "--site" to your command. e.g. pathena -c 'Events=1000' ../share/Htautau_jobOptions.py --site UKI-NORTHGRID-LIV-HEP_MCDISK --inDS mc09_7TeV.109910.SherpabbAtautaulhMA120TB20.merge.AOD.e534_s765_s767_r1250_r1260/ --outDS user10.katharineleney.myTestNtuple.root

* You can tell pathena to run over your GRL by doing:
pathena myJobOptions.py --goodRunListXML MyLBCollection.xml --outDS user10...

It will then translate your GRL into a list of datasets and use this in place of the --inDS option.

See https://twiki.cern.ch/twiki/bin/view/Atlas/PandaAthena#example_10_How_to_run_on_a_good for details.

-- KatharineLeney - 04 Jun 2010
Topic attachments
I Attachment Action Size Date Who Comment
PathenaSub.py.txttxt PathenaSub.py.txt manage 888 bytes 04 Aug 2010 - 13:25 KatharineLeney Script to submit multiple Pathena jobs
Topic revision: r5 - 04 Aug 2010, KatharineLeney
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback