This is an old revision of the document!
Table of Contents
Jupyter-Hub on the SCC
Note that for using Jupyter-Hub on the Scientific Compute Cluster (SCC) you need a GWDG account which is activated for the use of the compute cluster. If your GWDG account is not yet activated you can do it by sending an informal email to firstname.lastname@example.org.
How to use Jupyter-Hub on the SCC
Jupyter-Hub on the SCC can be used identically to the Jupyter / Jupyter-Hub and currently supports
R kernels. After a successful authentication, there are 2 options of spawning Jupyter notebooks available:
- GWDG HPC
- GWDG HPC with IPython Parallel
If you just need a Jupyter notebook, then select the 1st option. If you want to use IPython Parallel then select the 2nd option.
IPython Parallel allows to increase computational resources and spawn compute workers on any nodes of the HPC cluster (not in the interactive queue)
A Jupyter notebook and IPython Parallel workers run as normal jobs in the cluster, therefore you will receive an email with the output after the jobs are finished.
It means that ~20 CPUs and ~64GB memory of the nodes in the interactive queue are available to use in Jupyter notebook and shared between all users who simultaneously use the same node. Currently there are 4 nodes in the interactive queue and more nodes will be added in case of high demand.
Also you can use /scratch shared storage, which allows to store large files (terabytes), as well as your HOME directory.
Jupyter notebooks start not in the root of your HOME directory but in the folder
~/jupyterhub-gwdg. Place your notebooks and files in the corresponding folder.
Currently one session of Jupyter notebook can run maximum 8 hours, after that it will be killed, but your files will stay intact.
Using IPython Parallel
In order to make use of IPython Parallel, Jupyter should be started with
GWDG HPC with IPython Parallel spawner.
After the Jupyter notebook is launched, you can run engines using “IPython Clusters” tab of the web interface. There at slurm profile you should select the amount of engines to run and click the start button.
Note, that workers start as normal jobs in the
medium partition and it might take some time. However, the GUI doesn't have any functionality to check the state of workers, thus please wait before the engines are spawned. Nevertheless, you can always check the current state of the jobs with
squeue -u $USER command, which should be run in the terminal.
After the engines are up, the spawned cluster of workers can be checked by the following script:
import ipyparallel as ipp c = ipp.Client(profile="slurm") c.ids c[:].apply_sync(lambda : "Hello, World")
Workers currently configured to run maximum 1 hour. If you want to change that, you can edit the submission scripts of workers in
Installing additional Python modules
Additional Python modules can be installed via the terminal and the Python package manager “pip”. To do this, a terminal must be opened via the menu “New” → “Terminal”. Afterwards
python3 -m pip install --user <module>
installs a new module in the home directory.
The installation of large Python modules like “tensorflow” may fail with a message “No space left on device”. This is caused by the temporary space under “/tmp” being too small for pip to work the downloaded packages. The following steps use a temporary directory in the much larger user home directory for this one installation:
mkdir -v ~/.user-temp TMPDIR=~/.user-temp python3 -m pip install --user <module>
You also can use self defined kernels and install conda environments on non-parallel notebook. Please refer to Installing additional environments via conda