Job Management on Beocat

Beocat uses a fork of the OpenPBS batch system called Torque for job management. It supports all of the normal PBS commands and attributes, and adds some extensions.

Basic introduction

qsub is the tool used to submit jobs to the cluster. A 'job' is simply a shell script. So, for an extremely simple example, my_job.sh is a short shell script that simply reports back the name of the machine and uptime.

   1 #!/bin/bash
   2 echo "$(hostname): $(uptime)"

Please note that the first line is actually unnecessary. All scripts will be run through the shell you specify to qsub, defaulting to your normal shell. In fact, job scripts need not even be executable. (It was just handy to be able to debug it interactively)

kuffs@[clotho] ~ % qsub my_job.sh 
213.clotho.beocat

So we have submitted our job and it was given job number 213. You can use qstat to view the status of the jobs in the queue

kuffs@[clotho] ~ % qstat
Job id              Name             User            Time Use S Queue
------------------- ---------------- --------------- -------- - -----
213.clotho          my_job.sh        kuffs                  0 Q batch          

In this screen we see our job, its name, who it was launched by, and lots of other details. We also notice under the 'S' column there is a 'Q' meaning that the job is queued and waiting to be assigned somewhere to work. This isn't a big deal, it usually takes a few seconds for the scheduler to figure out what it is going to do.

Sure enough, try a few seconds later and we get new information.

kuffs@[clotho] ~ % qstat
Job id              Name             User            Time Use S Queue
------------------- ---------------- --------------- -------- - -----
213.clotho          my_job.sh        kuffs                  0 E batch          

Look under the status column and it is now 'E', exiting after finishing.

A little while later, we check on the status again

kuffs@[clotho] ~ % qstat

And don't receive any output. A quick look in our current directory, however, reveals two new files; 'my_job.sh.e213' and 'my_job.sh.o213'. These two files are the text written to STDERR and STDOUT by your job, respectively.

kuffs@[clotho] ~ % cat my_job.sh.e213

A quick peek in my_job.sh.e213 shows there is no error output.

kuffs@[clotho] ~ % cat my_job.sh.o213 
virgrack6:  14:17:40 up 9 days,  2:47,  0 users,  load average: 2.64, 2.28, 2.21

And my_job.sh.o213 has the output from the program.

I encourage you to check out the man pages for both qsub and qstat for much more information.

Selecting specific nodes

The biggest part of job submission is definitely allocating the correct resources for your job. The official documentation is quite good in this respect.

Check it out at http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml

I will add a few notes though,

Helpful tips

Scripting trick

To save yourself a bit of typing on the command line when testing, all the qsub parameters can be added as special comments in your job's shell script.

For example, let's say you have these parameters to give to qsub:

>qsub -l walltime=20:00:00 -S /bin/bash -N sim_pass3 my_job.sh

This would run the script my_job.sh using /bin/bash as the shell with a maximum run time of 20 hours and named 'sim_pass3'. This could get to be a pain to remember and edit every time. Fortunately, we can use special comments in my_job.sh that make our life easier:

   1 #!/bin/bash
   2 #PBS -l walltime=20:00:00
   3 #PBS -S /bin/bash
   4 #PBS -N sim_pass3
   5 echo "$(hostname): $(uptime)"

Now you would submit the job with the much simpler command:

>qsub my_job.sh

Troubleshooting

Job won't start, always in queued state

So you've started a job, and can't figure out why it won't start.

kuffs@[clotho] ~ % qsub my_job.sh -l mem=100G
219.clotho.beocat
kuffs@[clotho] ~ % qstat -n

clotho.beocat: 
                                                                   Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
219.clotho.beocat    kuffs    batch    my_job.sh     --      1  --  100gb 00:05 Q   -- 
    --          

Everything is fine, except that the scheduler doesn't seem to want to select nodes for the job to run on. There is a handy tool checkjob for this task.

kuffs@[clotho] ~ % checkjob 219
checking job 219

State: Idle  EState: Deferred
Creds:  user:kuffs  group:kuffs_users  class:batch  qos:DEFAULT
WallTime: 00:00:00 of 00:05:00
SubmitTime: Wed Nov 15 17:09:03
  (Time Queued  Total: 00:14:14  Eligible: 00:00:15)

Total Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
Dedicated Resources Per Task: PROCS: 1  MEM: 100G


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 0
PartitionMask: [ALL]
Flags:       RESTARTABLE

job is deferred.  Reason:  NoResources  (cannot create reservation for job '219' (intital reservation attempt)
)
Holds:    Defer  (hold reason:  NoResources)
PE:  38.18  StartPriority:  1
cannot select job 219 for partition DEFAULT (job hold active)

This incredibly detailed output tells us that the scheduler can't find the resources to give our task. Poking through more of the output, we see Dedicated Resources Per Task: PROCS: 1  MEM: 100G... doh! The scheduler happens to be right, there are no nodes in the cluster that have 100G of ram.

So we delete the broken job.

kuffs@[clotho] ~ % qdel 219

And submit our job, sans the typo.

kuffs@[clotho] ~ % qsub my_job.sh -l mem=100m
221.clotho.beocat

And things run just fine.

kuffs@[clotho] ~ % checkjob 221
checking job 221

State: Running
Creds:  user:kuffs  group:kuffs_users  class:batch  qos:DEFAULT
WallTime: 00:00:00 of 00:05:00
SubmitTime: Wed Nov 15 17:30:17
  (Time Queued  Total: 00:00:14  Eligible: 00:00:14)

StartTime: Wed Nov 15 17:30:31
Total Tasks: 1

Req[0]  TaskCount: 1  Partition: DEFAULT
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
Dedicated Resources Per Task: PROCS: 1  MEM: 100M
Allocated Nodes:
[virgrack6:1]


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 1
PartitionMask: [ALL]
Flags:       RESTARTABLE

Reservation '221' (00:00:00 -> 00:05:00  Duration: 00:05:00)
PE:  1.00  StartPriority:  1

The scheduler has chosen one of the processors on virgrack6 for our task.