Number of Nodes, Tasks and Cores

Chosing the right amount of nodes, tasks and cores can be a bit confusing at first, so we try to give some guidelines here.

There are two ways to get to more cores and therefore more compute power to your job: Either you requests lots of tasks with one core each, or you request one task with many cores.¹⁾ These options are controlled with the -n, -ntasks-per-node parameters for tasks and -c for cores per task in Slurm. So what's the difference between tasks and cores?

MPI Jobs

Tasks are mainly a tool for MPI jobs. If you use allocate many tasks, Slurm expects you to start your program many times in parallel with srun or mpirun. If you have 10 tasks available and use srun to start your program, your program will be executed 10 times in parallel. This is good for jobs that rely on MPI for communication. These usually use many tasks with one core each.

TL;DR: Use -n <cores>

SMP Jobs

Many programs, however, do not use MPI. These use shared memory parallelization (SMP), can only run on a single node and only need to be started once. If, for example, you use python and the multiprocessing library to parallelize your calculations, you only want your program to be started once, but still have access to multiple cores. In this case you want a single task with access to multiple cores.

TL;DR: Use -c <cores>

MPI Jobs and the Number of Nodes

While MPI tasks work fine while communicating via the network/interconnect²⁾, nothing beats the speed of shared memory communication. When MPI defaults to using the shared memory where possible and only uses the interconnect for communication between nodes. That means that you should pack your tasks as tightly onto nodes as possible. This not only speeds up your program, but also reduces the load on our network.

Our smallest medium nodes have 24 cores. This means, that up to 24 tasks, your job will always fit on a single node, 48 tasks will fit on two nodes, and so on. You should request the correct number of nodes using the -N option. But this is only the smallest common denominator. We also have the amp nodes in our medium partition, which have 96 cores. So you can even place 96 tasks onto a single nodes, therefore eliminating any network communication within MPI and speeding up your job. Only when going above 96 tasks, it is strictly necessary to spread your tasks among more than two nodes.

¹⁾

Anything in between works as well: multiple tasks having multiple cores. This is called a hybrid job.

²⁾

That's what MPI is designed for!