====== Singularity on the SCC ====== [[https://sylabs.io/singularity/|Singularity]] is a containerization system focused on scientific needs and designed for running on HPC resources. At GWDG Singularity can be used by simply loading the corresponding module: module load singularity After the module is loaded you are ready to pull and run your containers. In difference to Docker you can provide your own container images. For building you can use Docker images or Singularity bootstrap files. You can find the documentation for a building process at https://sylabs.io/docs/. ====== Examples ====== Several examples of Singularity usecases will be shown below. ===== Jupyter and IPython Parallel with Singularity ===== As an example we will pull and deploy the Singularity image containing Jupyter and IPython Parallel. First create a new folder in your ''%%$HOME%%'' directory. After that go to the directory and pull a container using Docker or Singularity registries or upload a locally built image. Here we will use a public Singularity image from SingularityHub, [[https://www.singularity-hub.org/collections/81|shub://A33a/sjupyter]]. Because SingularityHUB builds images automatically this takes time. If you want to have your container more quickly, then either build it locally, or load it into the DockerHub. For pulling the image run the following command: singularity pull --name sjupyter.sif shub://A33a/sjupyter Now the sjupyter.sif image is ready to be containerized. To submit the corresponding job, run the command: srun --pty -p int singularity shell sjupyter.sif Here we are requesting a shell to the container in the interactive partition. ===== GPU access within the container ===== GPU devices are visible within the container by default. Only driver and necessary libraries should be installed or binded to the container. You can install Nvidia drivers yourself or bind it to the container. To bind it automatically you need to run the container with ''%%--%%nv'' flag. For instance ''singularity shell %%--%%nv sjupyter.sif''. If you want to use specific version of the driver you can install it within the container or link existing version of a driver provided by the cluster to the container. For drivers to be visible inside the container, you have to add their location to environment variable ''LD_LIBRARY_PATH''. Here is example of linking Nvidia driver version 384.111: export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/cm/local/apps/cuda-driver/libs/384.111/lib64 When running a conitainer the corresponding path should be binded to it with ''-B'' option: singularity shell -B /cm/local/apps jupyterCuda.sif The libraries like CUDA and CuDNN should be mentioned in ''LD_LIBRARY_PATH'' variable as well: export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda-9.0/lib64 Here we have CUDA v9.0 installed within the container at /usr/local/cuda-9.0 If you want to use ''nvidia-smi'' command, then add its location to ''PATH'' environment variable. In the cluster we have it at ''/cm/local/apps/cuda/libs/current/bin'': export PATH=${PATH}:/cm/local/apps/cuda/libs/current/bin The example below is Singularity container bootstrap file which can be used for building the container based on Nvidia Docker image with preinstalled CUDA v9.0 and CuDNN v7 on Ubuntu 16.04 (more images of Nvidia can be found on [[https://hub.docker.com/r/nvidia/cuda/|DockerHub]]). As an example we install here a GPU version of tensorflow. Also the container uses current Nvidia drivers installed in the GWDG cluster: Bootstrap: docker From: nvidia/cuda:9.0-cudnn7-runtime %post apt-get -y update apt-get -y install python3-pip pip3 install --upgrade pip pip3 install tensorflow-gpu %environment PATH=${PATH}:${LSF_BINDIR}:/cm/local/apps/cuda/libs/current/bin LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda-9.0/lib64:/cm/local/apps/cuda-driver/libs/current/lib64 CUDA_PATH=/usr/local/cuda-9.0 CUDA_ROOT=/usr/local/cuda-9.0 You can shell into the container with: singularity shell -B /cm/local/apps CONTAINERNAME.sif ===== Distributed PyTorch on GPU ===== In case if you are using PyTorch for ML, you may want to try out to run it in the container on our GPU nodes using its distributed package. Here is the link ([[https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:pytorch_on_the_hpc_clusters|PyTorch on the HPC]]) where you can find the complete documentation.