Contents
Sun Grid Engine
About
- Sun Grid Engine is the scheduling system that Beocat is moving to. This is replacing the Torque/Maui setup that is currently in place. Sun Grid Engine (or SGE) has fewer of the pitfalls that were troubling the Torque/Maui Setup.
New Features
- Automated Checkpointing
- Proper Accounting: Oversubscribed nodes will no longer be scheduled.
- Better Documentation
- Better Resource Management
Getting Started with SGE
- Because Torque/Maui and SGE use similar commands, it will be necessary for users to read, and experiment with the slew of new, and/or changed commands.
qsub and qrsh
- Previousy, the qsub command allowed for specifying whether or not a job was to be an interactive or batch job. Now this is done with two seperate commands. qsub submits a batch job, while qrsh submits an interactive one.
Resource requests
- To specify the resources you need, you do it as arguments to either the qsub or qrsh command. If I were to need 1G of memory, I would specify it like so:
mozes@loki ~ $ qsub -l mem=1G
ormozes@loki ~ $ qrsh -l mem=1G
To request 4G of memory and infiniband with a max runtime of 2hrs, it is very similarmozes@loki ~ $ qsub -l mem=4G,ib=TRUE,h_rt=2:00:00
ormozes@loki ~ $ qrsh -l mem=4G,ib=TRUE,h_rt=2:00:00
There is a default of mem=1G and h_rt=1:00:00 for all requests.
A full listing of the objects you can request is available from the following command:mozes@loki ~ $ qconf -sc
Sample Output:name
shortcut
type
relop
requestable
consumable
default
urgency
arch
a
RESTRING
==
YES
NO
NONE
0
calendar
c
RESTRING
==
YES
NO
NONE
0
cpu
cpu
DOUBLE
>=
YES
NO
0
0
display_win_gui
dwg
BOOL
==
YES
NO
0
0
h_core
h_core
MEMORY
<=
YES
NO
0
0
h_cpu
h_cpu
TIME
<=
YES
NO
0:0:0
0
h_data
h_data
MEMORY
<=
YES
NO
0
0
h_fsize
h_fsize
MEMORY
<=
YES
NO
0
0
h_rss
h_rss
MEMORY
<=
YES
NO
0
0
h_rt
h_rt
TIME
<=
YES
NO
0:0:0
0
h_stack
h_stack
MEMORY
<=
YES
NO
0
0
h_vmem
h_vmem
MEMORY
<=
YES
NO
0
0
hostname
h
HOST
==
YES
NO
NONE
0
infiniband
ib
BOOL
==
YES
NO
FALSE
0
load_avg
la
DOUBLE
>=
NO
NO
0
0
load_long
ll
DOUBLE
>=
NO
NO
0
0
load_medium
lm
DOUBLE
>=
NO
NO
0
0
load_short
ls
DOUBLE
>=
NO
NO
0
0
mem_free
mf
MEMORY
<=
YES
YES
0
0
mem_total
mt
MEMORY
<=
YES
NO
0
0
mem_used
mu
MEMORY
>=
YES
NO
0
0
memory
mem
MEMORY
<=
FORCED
YES
0
0
min_cpu_interval
mci
TIME
<=
NO
NO
0:0:0
0
myrinet
mx
BOOL
==
YES
NO
FALSE
0
np_load_avg
nla
DOUBLE
>=
NO
NO
0
0
np_load_long
nll
DOUBLE
>=
NO
NO
0
0
np_load_medium
nlm
DOUBLE
>=
NO
NO
0
0
np_load_short
nls
DOUBLE
>=
NO
NO
0
0
num_proc
p
INT
==
YES
NO
0
0
qname
q
RESTRING
==
YES
NO
NONE
0
rerun
re
BOOL
==
NO
NO
0
0
s_core
s_core
MEMORY
<=
YES
NO
0
0
s_cpu
s_cpu
TIME
<=
YES
NO
0:0:0
0
s_data
s_data
MEMORY
<=
YES
NO
0
0
s_fsize
s_fsize
MEMORY
<=
YES
NO
0
0
s_rss
s_rss
MEMORY
<=
YES
NO
0
0
s_rt
s_rt
TIME
<=
YES
NO
0:0:0
0
s_stack
s_stack
MEMORY
<=
YES
NO
0
0
s_vmem
s_vmem
MEMORY
<=
YES
NO
0
0
seq_no
seq
INT
==
NO
NO
0
0
slots
s
INT
<=
YES
YES
1
1000
swap_free
sf
MEMORY
<=
YES
NO
0
0
swap_rate
sr
MEMORY
>=
YES
NO
0
0
swap_rsvd
srsv
MEMORY
>=
YES
NO
0
0
swap_total
st
MEMORY
<=
YES
NO
0
0
swap_used
su
MEMORY
>=
YES
NO
0
0
tmpdir
tmp
RESTRING
==
NO
NO
NONE
0
virtual_free
vf
MEMORY
<=
YES
NO
0
0
virtual_total
vt
MEMORY
<=
YES
NO
0
0
virtual_used
vu
MEMORY
>=
YES
NO
0
0
Multi threaded Jobs
SGE's Multithreaded job requests are very different from the ones you are used to with Torque/Maui. They are managed with what SGE terms Parallel Environments. In Torque/Maui, you would request a number of machines, and a number of processors per machine. In the SGE world, you simply request a number of processors, and ask for an environment that will allocate them.
Let's say that you want 8 processors. There are currently three ways of requesting them, depending on what you want to accomplish.mozes@loki ~ $ qsub -pe mpi-spread 8 mpijob.sh
ormozes@loki ~ $ qsub -pe single 8 mpijob.sh
ormozes@loki ~ $ qsub -pe mpi-fill 8 mpijob.sh
These submissions would have different behaviors. The first would give you 8 cores whereever it could get them from as many nodes as possible. This would be useful for a client-server model that needs lots of machines, not necessarily lots of cores.
The second would give you 8 cores on a single machine. This would limit you to requesting 16 or less cores, as our largest machine only has 16 cores. The second would be useful for an application that is threaded, but doesn't know anything about message passing between machines.
The third will give you 8 cores, but will try to get them on fewer number of machines if possible. The downside to the third one, as it stands now, is that it will put them anywhere if it can schedule it sooner, meaning it is capable of having the same effect as the first.
If you must have this job using otherwise empty nodes, your best bet is to make a request like this:mozes@loki ~ $ qsub -pe mpi-fill 8 -q '*@@scouts' mpijob.sh
This will force the job to start on the scouts, it may still get split up to 8 machines, but at least the nodes will be similar. One nice feature of the scheduler is the ability to request a range of processors. It will give you as many of them as it can. This can be done using the following syntax: "2|3|5-8|10|16". This would give you an effective request of either 16, 10, 8, 7, 6, 5, 3, or 2 cores, depending on how many cores are available when you submit.
Memory Requests with Multi-core jobs
- Memory requests in the SGE environment are much different from the setup used with Torque/Maui. Previously memory requests were per-job. In SGE memory requests are per-core. For example:
mozes@loki ~ $ qrsh -l mem=1G -pe mpi-spread 32
This would have requested 1G of memory per slot, ending up with a total request of 32G. If you actually wanted 1G for the entire job, you would need to compute 1G/32cores == 1024M/32cores == 32M/core. This means that you should make the following request when submitting the job:mozes@loki ~ $ qrsh -l mem=32M -pe mpi-spread 32
Node Group requests
- Sometimes it is beneficial to run jobs on a specific set of nodes. This can be done like this:
mozes@loki ~ $ qsub -q '*@@scouts' test.sh
Please note the "@@" syntax, there are two '@' symbols, as hostgroups always begin with @. A group listing can be obtained from:mozes@loki ~ $ qconf -shgrpl
Sample Output:@brutes
@brutes-large
@brutes-small
@fiends
@ib
@mostbrutes
@mostbrutes-large
@mostbrutes-small
@mostfiends
@mostnodes
@mostrogues
@mostscouts
@mosttitans
@mx
@rogues
@scouts
@scouts-rack1
@scouts-rack2
@scouts-rack3
@scouts-rack4
@titans
I would like to note that you shouldn't ever need to use any of the groups prefixed with '@most' as these are used for internal setup. Jobs will work, but they will be restricted to roughly 3/4 of whatever group you are trying to use.
To see the machines in each group you would do something like:mozes@loki ~ $ qconf -shgrp @rogues
Sample Output:group_name @rogues
hostlist rogue1.beocat rogue2.beocat rogue3.beocat rogue4.beocat rogue5.beocat \
rogue6.beocat rogue7.beocat rogue8.beocat rogue9.beocat \
rogue10.beocat rogue11.beocat rogue12.beocat rogue13.beocat \
rogue14.beocat rogue15.beocat rogue16.beocat
Checkpointing
- As was mentioned earlier, SGE has some automatic checkpointing support. There are a few caveats to this, however.
Multi-node jobs DO NOT checkpoint correctly if they use OpenMPI. This support is coming, but not available now
- Applications must be dynamically linked, or they must be rebuilt with the checkpointing libraries statically linked in.
Checkpointing must be enabled by the user if they want to use it. I will not enable it blindly for the entire cluster.
- Checkpoints only work on a single application (I will expand on this later in the section)
What is Checkpointing
- Checkpointing is the ability for the running job to be "snapshotted" at a scheduled interval. If the machine the job is running on were to power off unexpectedly, the job could be resumed from the "snapshot."
Will Checkpointing work for me?
- Well, it depends.
- Does your application use OpenMPI? If so, is it a multi node job? If you answered yes to both of these questions, then checkpointing will not work for you.
- Is your application statically linked? If so, can you recompile the application? If you answered yes, then no, then checkpointing will not work for you.
Are you using more than one application? If so, are they running at the same time? If you answered yes to both of the questions, checkpointing MAY work for you.
- For all other use cases, I would assume it will work.
1.) OpenMPI MultiNode jobs
- In short, this will not work. Please do not attempt to enable checkpointing on these jobs, because I cannot guarantee the application will be stable. In the future checkpointing should work for this type of job, but support is not built-in to the current stable release of OpenMPI
2.) Statically Linked Binaries
If your application is statically linked, and you do not have access to the source code to re-compile, there is no way that checkpointing can be enabled for this job.
If your application is statically linked, and you have access to the source code, you will need to recompile the application to enable the support. This can be done like so:mozes@loki ~ $ gcc -o my_app my_app.c -L/usr/lib64/ -lcr
Finally, if this is not enough information to get your application "checkpointable" e-mail the administrator (beocat@cis.ksu.edu), and he should be able to help you.
3.) More than one application per job
The main question that exists for this piece is this: "Can you serialize the steps you perform?"
For example, I have an application that got input from STDIN:
*WRONG*mozes@loki ~ $ cr_run cat FILE | cr_run osm2navit Blah.bin & 10101 [1] mozes@loki ~ $ cr_checkpoint 10101 Unable to checkpoint PID 10101
ormozes@loki ~ $ cr_run bzcat FILE.bz2 | osm2navit Blah.bin & 10103 [1] mozes@loki ~ $ cr_checkpoint 10103 Unable to checkpoint PID 10103
By changing the 'cat FILE' to a shell construct, that essentially does the same thing, I was able to checkpoint the application:*RIGHT*
mozes@loki ~ $ cr_run osm2navit Blah.bin < FILE & 10102 [1] mozes@loki ~ $ cr_checkpoint 10102 PID 10102 Checkpointed Successfully
ormozes@loki ~ $ cr_run bzcat FILE.bz2 > FILE mozes@loki ~ $ cr_run osm2navit Blah.bin < FILE & 10104 [1] mozes@loki ~ $ cr_checkpoint 10104 PID 10104 Checkpointed Successfully
Once you have successfully serialized your job script, you should be able to enable checkpointing on your job. Follow the instructions under "4.) All Others" to enable it.
4.) All others
- If your use case hasn't been screened out by the other caveats, then to make checkpointing work for you, you will need to make the following changes:
- Prefix the application in your script with cr_run
- Specify the Checkpointing environment, and how often you would like the application to be checkpointed
mozes@loki ~ $ qsub -ckpt BLCR -c 12:00:00 checkpoint.sh
In the preceding snippet of code, the option '-ckpt BLCR' specifies that you want to use BLCR as your checkpointing environment. As we do not support any other type of checkpointing, this will always be the same (if you want checkpointing).
The option '-c 12:00:00' essentially states that you want your application checkpointed every 12 hours. You can increase this number, but you cannot decrease this number. Every 12 hours is the most often you can have your application checkpointed.
Monitoring your job
- You have now submitted your job, and you want to see whether or not it has started.
The qstat command
- If you have used Beocat for very long, you might remember the qstat command from the Torque/Maui setup. qstat from SGE is very similar, but targeted towards SGE. To check running jobs, use the qstat command:
mozes@loki ~ $ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 101 0.00000 sge_test.s mozes qw 04/27/2009 10:00:36 1The useful information for us is the piece under the item "state." "qw" means that the job is queued and waiting. If the job were running, the output would be more like this:job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 101 0.50000 sge_test.s mozes r 04/27/2009 10:00:50 batch.q@titan4.beocat 1If that doesn't provide you with enough useful information, you could always use "qstat -ext":job-ID prior ntckts name user project department state cpu mem io tckts ovrts otckt ftckt stckt share queue slots ja-task-ID ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 102 0.00000 0.00000 sge_test.s mozes CIS defaultdep qw 0 0 0 0 0 0.00 1orjob-ID prior ntckts name user project department state cpu mem io tckts ovrts otckt ftckt stckt share queue slots ja-task-ID ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 102 0.50000 0.50000 sge_test.s mozes CIS defaultdep r 0:00:02:36 96.56516 0.00000 0 0 0 0 0 0.00 batch.q@titan4.beocat 1One of the niceties of SGE is that memory utilization and actual CPU time are reported to the scheduler. Thus we have almost realtime statistics on per job utilization.
I now want to know what the states are for all of my jobs, and what resources I request for them:mozes@loki ~ $ qstat -f -u mozes -r -ne queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- batch.q@titan4.beocat BIPC 0/1/16 2.95 lx24-amd64 102 0.50000 sge_test.s mozes r 04/27/2009 10:05:35 1 Full jobname: sge_test.sub Master Queue: batch.q@titan4.beocat Hard Resources: h_rt=3600 (0.000000) memory=4G (0.000000) Soft Resources: Hard requested queues: *@titan4.beocatThe -f means you want the full output. Unfortunately, this means that it will print information for all hosts, too. The -ne suppresses the output from hosts that are not used. The -r tells qstat that you want information about the resources requested, and the -u mozes says you want info just about mozes' jobs.
Diagnosing jobs
- What if you have submitted a job, and it just won't start? That is where "qstat -j" comes into play.
mozes@loki ~ $ qstat -f -u mozes -r -ne ############################################################################ - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS ############################################################################ 104 0.50000 sge_test.s mozes qw 04/27/2009 11:16:48 32 Full jobname: sge_test.sub Requested PE: single 32 Hard Resources: h_rt=3600 (0.000000) memory=4G (0.000000) Soft Resources: Hard requested queues: *@@titans mozes@loki ~ $ qstat -j scheduling info: queue instance "batch.q@brute4.beocat" dropped because it is overloaded: np_load_avg=1.341250 (no load adjustment) >= 1.25 Jobs can not run because queue instance is not contained in its hard queue list 104 Jobs can not run because available slots combined under PE are not in range of job 104Well, the last few lines of this command show that job 104, my job, cannot start because I requested more slots on a single machine than there are anywhere. To fix this, we could do something like this:mozes@loki ~ $ qalter -pe mpi-fill 32 104 modified parallel environment of job 104 modified slot range of job 104
This modifies the job, telling it to use a different parallel environment. When we run the qstat -j command, we see that:mozes@loki ~ $ qstat -j scheduling info: queue instance "batch.q@titan1.beocat" dropped because it is full queue instance "batch.q@titan2.beocat" dropped because it is full queue instance "batch.q@titan1.beocat" dropped because it is overloaded: memory=0.000000 (no load value) <= 0G queue instance "batch.q@titan2.beocat" dropped because it is overloaded: memory=0.000000 (no load value) <= 0G mozes@loki ~ $ qstat -f -u mozes -r -ne queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- batch.q@titan1.beocat BIPC 0/16/16 0.86 lx24-amd64 a 104 0.50000 sge_test.s mozes r 04/27/2009 11:26:35 16 Full jobname: sge_test.sub Master Queue: batch.q@titan1.beocat Requested PE: mpi-fill 32 Granted PE: mpi-fill 32 Hard Resources: h_rt=3600 (0.000000) memory=4G (0.000000) Soft Resources: Hard requested queues: *@@titans --------------------------------------------------------------------------------- batch.q@titan2.beocat BIPC 0/16/16 0.15 lx24-amd64 a 104 0.50000 sge_test.s mozes r 04/27/2009 11:26:35 16 Full jobname: sge_test.sub Master Queue: batch.q@titan2.beocat Requested PE: mpi-fill 32 Granted PE: mpi-fill 32 Hard Resources: h_rt=3600 (0.000000) memory=4G (0.000000) Soft Resources: Hard requested queues: *@@titansThe job seems to be running.
Other qsub features
Output file control
Current Working Directory
- qsub by default will place the output files in your home directory. If you feel that they should be organized differently, an easy way to do that is to use the -cwd option. cwd stands for current working directory, this means that the file will be placed in whatever directory you are currently in.
Combining STDOUT and STDERR
- In your qsub command, you can add "-j yes" to add the errors directly into the output file.
Job Naming
Sometimes it is nice to be able to specify the job name so that you can better see what is happening in the queue. To name your job, add -N $WhatEverNameYouWant to the qsub command.
Other SGE Features
Job limits
- Because of increased ability of SGE to handle parallel jobs, and some users affinity for filling the queue with long jobs, we have implemented a couple of methods for handling this.
Max Threads/Jobs per user
There is a hard limit on the number of cores any one person can utilize at any point in time. It is set to 500 currently, which I think is reasonable. If, for some reason, a user were to need this limitation raised, the administrator would need to be e-mailed with the request, and a detailed explanation as to why you need that many cores.
To determine how many cores you have utilized, and what the limit is, you would need to have a job running and run the qquota command:mozes@loki ~ $ qquota resource quota rule limit filter -------------------------------------------------------------------------------- max_slots_per_user/1 slots=4/500 users *
Time Limits
- Roughly 1/4 of the nodes cannot be used for longer than 72 hours. This means that there should always be cores available for short jobs. You do not need to know which machines are setup this way, the scheduler will handle the changes on it's own. qstat -j will tell you why a job is not starting.
Accounting
- Sometimes it is useful to know how about the resources you have utilized. This can be done through the qacct command:
mozes@loki ~ $ qacct -o $USER OWNER WALLCLOCK UTIME STIME CPU MEMORY IO IOW ====================================================================================================================== mozes 21856 11525.840 572.390 14276.910 32520.257 0.000 0.000
It may be more useful if you specify a timelimit for this query, for example the last day (-d 1):mozes@loki ~ $ qacct -o $USER -d 1 OWNER WALLCLOCK UTIME STIME CPU MEMORY IO IOW ====================================================================================================================== mozes 3832 2962.040 145.480 3234.560 7785.764 0.000 0.000
If you work with a group of people on Beocat, and need to account for everyone's time: NOTE This only works if you have a group setup in Beocat.mozes@loki ~ $ qacct -P `qconf -suser $USER | grep default_project | awk '{ print $2 }'` PROJECT WALLCLOCK UTIME STIME CPU MEMORY IO IOW ======================================================================================================================== CIS 12917 7656.330 360.300 9121.270 20576.334 0.000 0.000
Default SGE Requests
- SGE has the ability to utilize configuration files for submitting job files. This is done through the ~/.sge_request file.
mozes@loki ~ $ vim ~/.sge_request # This file is used as a configuration file for SGE jobs, you can set default arguments here. # I don't like the default request of 1G of memory for my jobs -l mem=2G # I want to be mailed on (a)bort, (b)egin, and (e)nd of my job -m abe # I want to be mailed at my k-state e-mail -M mozes@ksu.edu
One thing to remember is that SGE uses many of these files. It first uses the one located at /opt/sge/default/common/sge_request, then looks for the one in ~/.sge_request, then looks for a .sge_request file in the current directory. If you don't want it to use any of that, you can use "qsub -clear" followed by any options you need. This stop any options qsub found in the .sge_request files from being used.
Job script options
- If you were fortunate enough to use the Torque/Maui setup, you may have used an option of specifying resource requests within the job script. This was done via "#PBS $OPTION". This can also be done in SGE:
mozes@loki ~ $ vim sge_test.sub #$ -S /usr/local/bin/sh # Specify my shell as sh #$ -l h_rt=2:00:00 # Give me a 2 hour limit to finish the job echo "Running osm2navit" /usr/bin/env cr_run ~/navit/navit/osm2navit Blah.bin < ~/osm/blah.osm echo "finished osm2navit with exit code $?"
As you can see, I used the construct "#$ $OPTION" to implement submit options in the job script.
SGE Environment variables
- Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that SGE makes available to you.
HOSTNAME=titan1.beocat SGE_TASK_STEPSIZE=undefined SGE_INFOTEXT_MAX_COLUMN=5000 SHELL=/usr/local/bin/sh NHOSTS=2 SGE_O_WORKDIR=/homes/mozes TMPDIR=/tmp/105.1.batch.q SGE_O_HOME=/homes/mozes SGE_ARCH=lx24-amd64 SGE_CELL=default RESTARTED=0 ARC=lx24-amd64 USER=mozes QUEUE=batch.q PVM_ARCH=LINUX64 SGE_TASK_ID=undefined SGE_BINARY_PATH=/opt/sge/bin/lx24-amd64 SGE_STDERR_PATH=/homes/mozes/sge_test.sub.e105 SGE_STDOUT_PATH=/homes/mozes/sge_test.sub.o105 SGE_ACCOUNT=sge SGE_RSH_COMMAND=builtin JOB_SCRIPT=/opt/sge/default/spool/titan1/job_scripts/105 JOB_NAME=sge_test.sub SGE_NOMSG=1 SGE_ROOT=/opt/sge REQNAME=sge_test.sub SGE_JOB_SPOOL_DIR=/opt/sge/default/spool/titan1/active_jobs/105.1 ENVIRONMENT=BATCH PE_HOSTFILE=/opt/sge/default/spool/titan1/active_jobs/105.1/pe_hostfile SGE_CWD_PATH=/homes/mozes NQUEUES=2 SGE_O_LOGNAME=mozes SGE_O_MAIL=/var/mail/mozes TMP=/tmp/105.1.batch.q JOB_ID=105 LOGNAME=mozes PE=mpi-fill SGE_TASK_FIRST=undefined SGE_O_HOST=loki SGE_O_SHELL=/bin/bash SGE_CLUSTER_NAME=beocat REQUEST=sge_test.sub NSLOTS=32 SGE_STDIN_PATH=/dev/null
Sometimes it is nice to know what hosts you have access to during a PE job. You would checkout the PE_HOSTFILE to know that.
If your job has been restarted, it is nice to be able to change what happens rather than redoing all of your work. If this is the case, RESTARTED would equal 1.
There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.
Non-mpi multi-node jobs
- Some people have the need to run multi-node jobs that are not MPI aware. If this is the case, most of them use ssh or rsh to connect to other nodes to do this. Under Torque/Maui ssh access was allowed to nodes that had one of your jobs allocated to it. This does not function in the same way with SGE. SGE provides the utility qrsh to manage this sort of thing. Earlier in this document I stated that qrsh is used for interactive jobs. while this is true, it also allows access to currently running jobs provided you have given it the correct options. Here is an example:
#!/bin/bash # Let's say I am running a script that needs two threads, each on one node. # Since I already know that I will have 2 nodes, I need to figure out the name of the other host I have access to OTHERHOST="`cat $PE_HOSTFILE | awk '{print $1}' | grep -v $HOSTNAME`" # Now I am going to fire up a server on the other host qrsh -inherit $OTHERHOST $SERVERAPPS & # Now I continue on with my script, confident the server is running on the other host. $CLIENTAPPSThe qrsh construct will work directly from loki, provided to have set the correct environment variables, and the job you are connecting to is a PE job. The relevant environment variables are JOB_ID and SGE_TASK_ID. JOB_ID is as simple as the job number. SGE_TASK_ID can be anything, and it is commonly set to "undefined." This can be done in one simple command:mozes@loki ~ $ JOB_ID=161 SGE_TASK_ID=undefined qrsh -inherit scout74.beocat env MANPATH=/opt/sge/man:/opt/sge/man:/opt/sge/man:/usr/share/man:/usr/local/share/man SGE_INFOTEXT_MAX_COLUMN=5000 SHELL=/usr/local/bin/tcsh SSH_CLIENT=10.0.0.6 38240 22 SGE_CELL=default USER=mozes PATH=/tmp/161.1.batch.q:/usr/local/bin:/bin:/usr/bin MAIL=/var/mail/root PWD=/homes/mozes SGE_ROOT=/opt/sge SGE_NOMSG=1 HOME=/homes/mozes SHLVL=5 LOGNAME=mozes SSH_CONNECTION=10.0.0.6 38240 10.0.0.154 22 SGE_CLUSTER_NAME=beocat _=/opt/sge/bin/lx24-amd64/sge_execd QRSH_PORT=loki.beocat:36473 QRSH_COMMAND=env REQNAME=petask JOB_NAME=petask JOB_SCRIPT=QRSH SGE_BINARY_PATH=/opt/sge/bin/lx24-amd64 REQUEST=petask HOSTNAME=scout74.beocat QUEUE=batch.q JOB_ID=161 ENVIRONMENT=BATCH ARC=lx24-amd64 NQUEUES=1 NSLOTS=1 NHOSTS=1 RESTARTED=0 TMPDIR=/tmp/161.1.batch.q TMP=/tmp/161.1.batch.q PE=mpi-spread PE_HOSTFILE=/opt/sge/default/spool/scout74/active_jobs/161.1/2.scout74/pe_hostfile SGE_RSH_COMMAND=builtin SGE_O_HOME=/homes/mozes SGE_O_LOGNAME=mozes SGE_O_PATH=/opt/sge/bin/lx24-amd64:/homes/mozes/bin:/homes/mozes/.homefiles/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/x86_64-pc-linux-gnu/gcc-bin/4.1.2:/opt/intel/fce/10.1.008/bin:/opt/vmware/server/bin SGE_O_SHELL=/bin/bash SGE_O_MAIL=/var/mail/mozes SGE_O_HOST=loki SGE_O_WORKDIR=/homes/mozes SGE_TASK_ID=undefined SGE_TASK_FIRST=undefined SGE_TASK_LAST=undefined SGE_TASK_STEPSIZE=undefined SGE_ARCH=lx24-amd64 SGE_ACCOUNT=sge SGE_JOB_SPOOL_DIR=/opt/sge/default/spool/scout74/active_jobs/161.1/2.scout74 SGE_STDIN_PATH=/dev/null SGE_STDOUT_PATH=/dev/null SGE_STDERR_PATH=/dev/null SGE_CWD_PATH=/homes/mozes HOSTTYPE=x86_64-linux VENDOR=unknown OSTYPE=linux MACHTYPE=x86_64 GROUP=mozes_users HOST=scout74 REMOTEHOST=
PE Hostfile syntax
- Some of you will need to know what hosts, and what processors on those hosts, a PE job has access to. In the job an environment variable called PE_HOSTFILE gets set, pointing to a file with the following syntax:
hostname.domainname #ofprocessors queue@hostname.domainname <NULL>
Sample:scout41.beocat 1 batch.q@scout41.beocat <NULL> scout42.beocat 1 batch.q@scout42.beocat <NULL> scout43.beocat 1 batch.q@scout43.beocat <NULL> scout44.beocat 1 batch.q@scout44.beocat <NULL> titan8.beocat 1 batch.q@titan8.beocat <NULL>
Memory Request Table
- Here is a table of some common memory-core combinations: The math to get these numbers is simple:
Memory/Cores
512M
1G
2G
3G
4G
8G
16G
32G
64G
1 Core
512M
1G
2G
3G
4G
8G
16G
32G
64G
2 Cores
256M
512M
1G
1536M
2G
4G
8G
16G
32G
3 Cores
171M
342M
683M
1G
1350M
2700M
5400M
10800M
21600M
4 Cores
128M
256M
512M
768M
1G
2G
4G
8G
16G
8 Cores
64M
128M
256M
384M
512M
1G
2G
4G
8G
16 Cores
32M
64M
128M
192M
256M
512M
1G
2G
4G
32 Cores
16M
32M
64M
96M
128M
256M
512M
1G
2G
64 Cores
8M
16M
32M
48M
64M
128M
256M
512M
1G
Memory Needed per job --------------------- Number of CoresSimple conversions between Megabyte and Gigabyte are done like so:1 Terabyte
1024 Gigabytes
1 Gigabyte
1024 Megabytes
1 Megabyte
1024 Kilobytes
1 Kilobyte
1024 Bytes