This is an old revision of the document!
Table of Contents
The GWDG Scientific Compute Cluster
This scheme shows the basic cluster setup at GWDG. The cluster is distributed across two facilities, with the “ehemalige Fernmeldezentrale” facility hosting the older resources and the shared /scratch file system and the “Faßberg” facility hosting the latest resources and the shared /scratch2 file system. The shared /scratch2 is usually the best choice for temporary data in your jobs, but it is only available at the Faßberg resources (selectable with -R scratch2
). The scheme also shows the queues and resources by which nodes are selected using the -q
and -R
options of bsub
.
''bsub'': Specifying node properties with ''-R''
-R scratch[2]
The node must have access to shared /scratch
or /scratch2
.
-R work
The node must have access to one of the shared /work
directories.
-R “ncpus=<x>“
Choose only nodes with a job slot count of x
. This is useful with span[ptile=<x>]
.
-R big
Choose the nodes with the maximum memory per core available in the queue. Currently only distinguishes gwdaxxx
from gwdpxxx
nodes.
-R latest
Always use the latest (and usually most powerful) nodes available in the queue. To get a list of current latest nodes run the command bhosts -R latest
on one of the frontends. You can also check the Latest Nodes page for more information.
''bsub'': Using Job Scripts
A job script is a shell script with a special comment section: In each line beginning with #BSUB
the following text is interpreted as a bsub
option. Here is an example:
#!/bin/sh #BSUB -q mpi #BSUB -W 00:10 #BSUB -o out.%J /bin/hostname
Job scripts are submitted by the following command:
bsub < <script name>
Exclusive jobs
An exclusive job does use all of its allocated nodes exclusively, i.e., it never shares a node with another job. This is useful if you require all of a node's memory (but not all of its CPU cores), or for SMP/MPI hybrid jobs, for example.
To submit an exclusive job add -x
to your bsub options. For example, to submit a single task job, which uses a complete fat node with 256 GB memory, you could use:
bsub -x -q fat -R big ./mytask
(-R big requests a 256 GB node, excluding the 128 GB nodes in the fat queue)
For submitting an OpenMP/MPI hybrid job with a total of 8 MPI processes, spread evenly across 2 nodes, use:
export OMP_NUM_THREADS=4 bsub -x -q mpi -n 8 -R span[ptile=4] -a intelmpi mpirun.lsf ./hybrid_job
(each MPI process creates 4 OpenMP threads in this case).
Please note that fairshare evaluation and accounting is done based on the number of job slots allocated. So the first example would count as 64 slots for both fairshare and accounting.
Using exclusive jobs does not require reserving all of a node's slots explicitly (e.g., with span[ptile='!']) and subsequently using the MPI library's mpiexec or mpiexec.hydra to set the process number, as we explain in our introductory course. This makes submitting a hybrid job as exclusive job more straightforward.
However, there is a disadvantage: LSF will not reserve the additional job slots required to get a node exclusively. Therefore, when the cluster is very busy, an exclusive job needing a lot of nodes may wait significantly longer.
A Note On Job Memory Usage
LSF will try to fill up each node with processes up to its job slot limit. Therefore each process in your job must not use more memory than available per core! If your per core memory requirements are too high, you have to add more job slots in order to allow your job to use the memory of these slots as well. If your job's memory usage increases with the number of processes, you have to leave additional job slots empty, i.e., do not run processes on them.
Recipe: Reserving Memory for OpenMP
The following job script recipe demonstrates using empty job slots for reserving memory for OpenMP jobs:
#!/bin/sh #BSUB -q fat #BSUB -W 00:10 #BSUB -o out.%J #BSUB -n 64 #BSUB -R big #BSUB -R "span[hosts=1]" export OMP_NUM_THREADS=8 ./myopenmpprog
Disk Space Options
You have the following options for attributing disk space to your jobs:
/local
This is the local hard disk of the node. It is a fast - and in the case of the gwda, gwdd, dfa, dge, dmp, dsu
and dte
nodes even very fast, SSD based - option for storing temporary data. There is automatic file deletion for the local disks.
/scratch
This is the shared scratch space, available on gwda
and gwdd
nodes and frontends gwdu101
and gwdu102
. You can use -R scratch
to make sure to get a node with access to shared /scratch. It is very fast, there is no automatic file deletion, but also no backup! We may have to delete files manually when we run out of space. You will receive a warning before this happens.
/scratch2
This space is the same as scratch described above except it is ONLY available on the nodes dfa, dge, dmp, dsu
and dte
and on the frontend gwdu103
. You can use -R scratch2
to make sure to get a node with access to that space.
$HOME
Your home directory is available everywhere, permanent, and comes with backup. Your attributed disk space can be increased. It is comparably slow, however.
Recipe: Using ''/scratch''
This recipe shows how to run Gaussian09 using /scratch
for temporary files:
#!/bin/sh #BSUB -q fat #BSUB -n 64 #BSUB -R "span[hosts=1]" #BSUB -R scratch #BSUB -W 24:00 #BSUB -C 0 #BSUB -a openmp export g09root="/usr/product/gaussian" . $g09root/g09/bsd/g09.profile mkdir -p /scratch/${USER} MYSCRATCH=`mktemp -d /scratch/${USER}/g09.XXXXXXXX` export GAUSS_SCRDIR=${MYSCRATCH} g09 myjob.com myjob.log rm -rf $MYSCRATCH
Using ''/scratch2''
Currently the latest nodes do NOT have an access to /scratch
. They have an access only to shared /scratch2
.
If you use scratch space only for storing temporary data, and do not need to access data stored previously, you can request /scratch or /scratch2:
#BSUB -R "scratch||scratch2"
For that case /scratch2
is linked to /scratch
on the latest nodes. You can just use /scratch/${USERID}
for the temporary data (don't forget to create it on /scratch2
). On the latest nodes data will then be stored in /scratch2
via the mentioned symlink.
Miscallaneous LSF Commands
While bsub
is arguably the most important LSF command, you may also find the following commands useful:
bjobs
Lists current jobs. Useful options are: -p, -l, -a, , <jobid>, -u all, -q <queue>, -m <host>
.
bhist
Lists older jobs. Useful options are: -l, -n, <jobid>
.
lsload
Status of cluster nodes. Useful options are: -l, <hostname>
.
bqueues
Status of batch queues. Useful options are: -l, <queue>
.
bhpart
Why do I have to wait? bhpart
shows current user priorities. Useful options are: -r, <host partition>
.
bkill
The Final Command. It has two use modes:
bkill <jobid>
: This kills a job with a specific jobid.bkill <selection options> 0
: This kills all jobs fitting the selection options. Useful selection options are:-q <queue>, -m <host>
.
Have a look at the respective man pages of these commands to learn more about them!
Getting Help
The following sections show you where you can get status Information and where you can get support in case of problems.
Information sources
- Cluster status page
- HPC announce mailing list
Using the GWDG Support Ticket System
Write an email to hpc@gwdg.de. In the body:
- State that your question is related to the batch system.
- State your user id (
$USER
). - If you have a problem with your jobs please always send the complete standard output and error!
- If you have a lot of failed jobs send at least two outputs. You can also list the jobids of all failed jobs to help us even more with understanding your problem.
- If you don’t mind us looking at your files, please state this in your request. You can limit your permission to specific directories or files.