Running batch jobs on matrix


Matrix is a cluster of 8 dual CPU's dedicated to CDF. Six subnodes can be logged into by typing 'ssh node1', 'ssh node3' ... 'ssh node7' (node2 does not seem to work). The status of the subnodes can be checked by typing 'pbsnodes -a'.

The cluster has two 1.8TB disk arrays available under /data1 and /data2. The free capacity of the disks can be checked by typing 'df' (disk free).

Batch jobs with pbs

Jobs which require a long time to run (typically more than 1 CPU hour) are best run as batch jobs. Matrix uses pbs (portable batch system) for submitting batch jobs.

Prepare a shell script that executes your job, for example (all examples can be found on matrix:~oldeman/pbs). Type 'qsub' to submit the job. A line saying '' appears, where xxxxx is the job number. Type 'qstat' to check the status of the job. When the job is finished, two files are produced: contains the output of the job, and contains the error messages of the job.

(this example script runs for about 30 seconds and calculates the 200th prime number).


Submitting multiple jobs

The main advantage of running batch jobs on the cluster is that you can run multiple jobs (up to 12) in parallel. Instead of submitting each job by hand, you can make a script that launches multiple jobs. Of the many ways to do that, I find the most straightforward method to write a script that uses sed to make modified copies of the original script and submits the jobs.

Running CDF software in multiple batch jobs

CDF software is always is bit more complicated to run than normal jobs. Things to take into account: An example script that produces 1000 inclusive B decays (generator-level only) is, and a launch script

Last updated on 25/05/05, Rolf Oldeman.