Jupyter-Hub on the SCC

GWDG also offers Jupyter-Hub on HPC as a beta service for users of Python.

Note that for using Jupyter-Hub on the Scientific Compute Cluster (SCC) you need a GWDG account which is activated for the use of the compute cluster. Information on how to activate your account can be found here.

How to use Jupyter-Hub on the SCC

Jupyter-Hub on the SCC can be used identically to the Jupyter / Jupyter-Hub and currently supports Python and R kernels. After a successful authentication, there are 3 options of spawning Jupyter notebooks available:

  1. GWDG HPC
  2. GWDG HPC with IPython Parallel
  3. GWDG HPC with Own Container

If you just need a Jupyter notebook, then select the 1st option. If you want to use IPython Parallel then select the 2nd option. In case if you have your own Singularity container and want to use the notebook from that container, then select the 3rd option.

IPython Parallel allows to increase computational resources and spawn compute workers on any nodes of the HPC cluster (not in the interactive queue)

A Jupyter notebook and IPython Parallel workers run as normal jobs in the cluster.

There are several options you can set on the spawning page to adjust the resourses, like number of cores and amount of memory.

Options

Job profile: this option sets which notebook you want to use, normal one, with IPython Parallel or your own.

Singularity container location: if you selected your own Container, then you have to provide the full path (or using $HOME variable) to the container you want to spawn. More on that further in the documentation.

Duration: The duration of the job, note that after that time, the job will be killed and you should spawn the notebook once again.

Number of cores: This option sets the number of cores accessible by the notebook. Note that cores are not for exclusive usage and might be shared if there are more notebooks spawned than available resources.

Amount of memory: The option sets the amount of memory accessible by the notebook. Same as the amount of cores, memory might be shared if there are more notebooks than resources.

Notebook's Home directory: You can provide some custom location for a default home of the Jupyter. This path will be a default path which will be opened when the notebook is spawned.

Resources

Jupyter notebooks in Jupyter-Hub on HPC service are launched in the interactive queue of the High Performance Computing Cluster.

It means that ~24 CPUs and ~128GB memory of the nodes in the interactive queue are available to use in Jupyter notebook and shared between all users who simultaneously use the same node. Currently there are 4 nodes in the interactive queue and more nodes will be added in case of high demand.

Also you can use /scratch (/scratch2) shared storage, which allows to store large files (terabytes), as well as your HOME directory.

Jupyter notebooks by default start not in the root of your HOME directory but in the folder ~/jupyterhub-gwdg. Place your notebooks and files in the corresponding folder. Or set the Home directory option described above.

Currently one session of Jupyter notebook can run maximum 8 hours, after that it will be killed, but your files will stay intact.

Using IPython Parallel

In order to make use of IPython Parallel, Jupyter should be started with GWDG HPC with IPython Parallel spawner.

After the Jupyter notebook is launched, you can run engines using “IPython Clusters” tab of the web interface. There at slurm profile you should select the amount of engines to run and click the start button.

Note, that workers start as normal jobs in the medium partition and it might take some time. However, the GUI doesn't have any functionality to check the state of workers, thus please wait before the engines are spawned. Nevertheless, you can always check the current state of the jobs with squeue -u $USER command, which should be run in the terminal.

After the engines are up, the spawned cluster of workers can be checked by the following script:

import ipyparallel as ipp
c = ipp.Client(profile="slurm")
c.ids
c[:].apply_sync(lambda : "Hello, World")

Workers currently configured to run maximum 1 hour. If you want to change that, you can edit the submission scripts of workers in ~/.ipython/profile_slurm/ipcluster_config.py

Installing additional Python modules

Additional Python modules can be installed via the terminal and the Python package manager “pip”. To do this, a terminal must be opened via the menu “New” → “Terminal”.

By default the Internet is not accessible within the Notebook, in order to install or download from Internet, you need to use Proxy by exporting following environment variables:

export https_proxy="https://www-cache.gwdg.de:3128/"
export http_proxy="http://www-cache.gwdg.de:3128/"

Afterwards

python3 -m pip install --user <module>

installs a new module in the home directory.

The installation of large Python modules like “tensorflow” may fail with a message “No space left on device”. This is caused by the temporary space under “/tmp” being too small for pip to work the downloaded packages. The following steps use a temporary directory in the much larger user home directory for this one installation:

mkdir -v ~/.user-temp
TMPDIR=~/.user-temp python3 -m pip install --user <module>

You also can use self defined kernels and install conda environments on non-parallel notebook. Please refer to Installing additional environments via conda

Running your own Singularity Container

You can build your own Singularity container with Jupyter notebook and run it. It will allow you to not only be independent of our Jupyter notebooks, but also easily spawn the same notebook in your local environment for development or tests.

Here is an example of the Singularity container you might use for your own Jupyter notebook:

Bootstrap: docker
From: ubuntu:18.04
 
%post
 
apt-get -y update
apt-get -y install net-tools
apt-get -y install curl wget
 
# Install conda and check the md5 sum provided on the download site
export CONDA_DIR=/opt/conda
export PATH=$CONDA_DIR/bin:$PATH
export MINICONDA_VERSION=4.7.12.1
 
cd /tmp && \
wget --quiet https://repo.continuum.io/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh
echo "81c773ff87af5cfac79ab862942ab6b3 *Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh" | md5sum -c -
/bin/bash Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh -f -b -p $CONDA_DIR
rm Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh
$CONDA_DIR/bin/conda config --system --prepend channels conda-forge
$CONDA_DIR/bin/conda config --system --set auto_update_conda false
$CONDA_DIR/bin/conda config --system --set show_channel_urls true
$CONDA_DIR/bin/conda install --quiet --yes conda="${MINICONDA_VERSION%.*}.*"
conda update --all --quiet --yes
conda clean -tipsy
 
conda install --quiet --yes \
    'notebook=6.0.3' \
    'jupyterhub=1.0.0' \
    'jupyterlab=2.1.5'
 
 
curl -sL https://deb.nodesource.com/setup_8.x | bash -
apt-get install -y nodejs
 
jupyter labextension install @jupyterlab/hub-extension
 
%environment
 
XDG_RUNTIME_DIR=""
PATH=/opt/conda/bin:${PATH}

You can extend that notebooks as much as you want. The important thing is to make jupyterhub-singleuser binary which comes with jupyterhub package available in PATH (done by last line). This binary is called when the notebook starts. Also it is better to use these versions of notebook, jupyterhub and jupyterlab for compatibility with the JupyterHub on SCC.

Unfortunately you cannot build containers in SCC. Please install Singularity on your local machine, build the image and then transfer the image on SCC.

This website uses cookies. By using the website, you agree with storing cookies on your computer. Also you acknowledge that you have read and understand our Privacy Policy. If you do not agree leave the website.More information about cookies