Table of Contents
High Performance Computing
Welcome to the official documentation of the Scientific Compute Cluster (SCC). It is the high performance computing system operated by the GWDG for both the Max Planck Society and the University of Göttingen.
This documentation will give you the necessary information to get access to the system, find the right software or compile your own, and run calculations.
Latest News
- [hpc-announce] GWDG HPC Downtime June 12th until 14th (2023/06/09 12:29)
- [hpc-announce] You are invited to our GöHPCoffee on upcoming wednesday (2023/06/06 10:37)
- [hpc-announce] GWDG HPC Downtime June 12th until 14th (2023/05/30 12:30)
- [hpc-announce] Status of Bioinformatics Services: Update (2023/05/30 11:51)
- [hpc-announce] Bioinformatics Services: Update (2023/05/26 19:42)
An archive of all news items can be found at the HPC-announce maling list.
Accessing the system
to use the compute cluster, you need a full GWDG account. Most employees of the University of Göttingen and the Max Planck Institutes already have such an account. This account is not activated for the use of the compute resources by default. More information on how to get your account activated or how to get an account can be found here.
Once you are activated, you can login-mdc.hpc.gwdg.de
. These nodes are only accessible via ssh from the GÖNET. If you come from the internet, you need to either use a VPN or use our login server. You can find detailed instructions here.
Submitting jobs
Our compute cluster is divided into frontends and compute nodes. The frontends are meant for editing, compiling, and interacting with the batch system. Please do not use them for intensive testing, i.e. calculations longer than a few minutes. All users share resources on the frontends and will be impaired in their daily work if you overuse them.
To run a program on one (or more) of the compute nodes, you need to interact with our batch system, or scheduler, Slurm. You can do this with several different commands, such as srun
, sbatch
, and squeue
1). A very simple example for such an interaction would be this:
$ srun hostname dmp023
This runs the program hostname
2) on one of our compute nodes. However, the program would only get access to a single core and very little memory. Not a problem for the hostname
program, but if you want to calculate something more serious, you will need access to more resources. You can find out how to do that in our Slurm documentation.
Software
We provide a growing number of programs, libraries, and software on our system. These are available as modules
. You can find a list with the module avail
command and load them via module load
. For example, if you want to run GROMACS, you simply use module load gromacs
to get the most recent version. Additionally, we use a package management tool called Spack to install software. A guide on how to use modules and Spack is available here.
We provide different compilers and libraries if you want to compile your software on your own. As with the rest of the software, these are available as modules. These include gcc
, intel
, and nvhpc
as compilers, openmpi
, intel-oneapi-mpi
as MPI libraries, and others such as mpi4py
, fftw
and hdf5
. You can find more specific instructions on code compilation on our dedicated page.
Performance Engineering and Analysis
Performance engineering, analyis and optimization is imperative for HPC applications especially considering the huge amount of resources spent on assembling and operating large computing systems with complex microprocessors (X-PUs) and memory architectures.
Performance analysis and optimization of HPC applications involve mainly three steps, application instrumentation, run-time measurements of key events and visual analysis of profiles and events traces.
Performance tools currently available for use in our clusters, for CPUs and GPUs are: “LIKWID”, “Score-P”, “Vampir”, and “Scalasca” for CPUs and Nsight Toolset (System, Compute and Graphic) for Nvidia GPUs. The tools and how to use them in the cluster are documented in Performance Tools page.
A short note on naming
The frontends and transfer nodes also have descriptive names of the form $func-$site.hpc.gwdg.de
based on their primary function and site, where $func
is either login
or transfer
while $site
is mdc
(modular data center, access to scratch
). For example, to reach any login node at the MDC site, you would connect to login-mdc.hpc.gwdg.de
.
Hardware Overview
The following documentation is valid for this list of hardware:
Nodes | # | CPU | GPU | Cores | Frequency | Memory | IB | Partition | Launched |
---|---|---|---|---|---|---|---|---|---|
gwde001 | 1 | Haswell Intel E7-4809 v3 | none | 4✕8 | 2.0 GHz | 2 TB | OPA | fat+ | 2016-01 |
dsu[001-004] | 4 | Haswell Intel E5-4620 v3 | none | 4✕10 | 2.0 GHz | 1.5 TB | none | fat+ | 2016-08 |
dge[001-003] | 3 | Broadwell Intel E5-2630 v4 | NVidia GTX 1080 | 2✕10 | 2.2 GHz | 128 GB | none | gpu | 2017-06 |
dge[008-009] | 2 | Broadwell Intel E5-2630 v4 | NVidia GTX 980 | 2✕10 | 2.2 GHz | 128 GB | none | gpu-int | 2017-06 |
dfa[001-012] | 12 | Broadwell Xeon E5-2650 v4 | none | 2✕12 | 2.2 GHz | 512 GB | none | fat | 2016-08 |
amp[001-094] | 94 | Cascade Lake Intel Platinum 9242 | none | 2✕48 | 2.3 GHz | 384 GB | OPA | medium | 2020-11 |
amp[095-096] | 2 | Cascade Lake Intel Platinum 9242 | none | 2✕48 | 2.3 GHz | 384 GB | OPA | int | 2020-11 |
agq[001-012] | 12 | Cascade Lake Intel Gold 6242 | NVidia Quadro RTX5000 | 2✕16 | 2.8 GHz | 192 GB | OPA | gpu | 2020-11 |
agt[001-002] | 2 | Cascade Lake Intel Gold 6252 | NVidia Tesla V100 / 32G | 2✕24 | 2.1 GHz | 384 GB | OPA | gpu | 2020-11 |
Explanation: Systems marked with an asterisk (*) are only available for research groups participating in the corresponding hosting agreement. GB = Gigabyte, TB = Terabyte, Gb/s = Gigabit per second, GHz = Gigahertz, GT/s = Giga transfer per second, IB = Infiniband, QDR = Quad data rate, FDR = Fourteen Data Rate.
For a complete overview of hardware, located in Göttingen, look at https://www.gwdg.de/web/guest/hpc-on-campus/scc