Differences
This shows you the differences between two versions of the page.
en:services:application_services:high_performance_computing:software:hail [2021/04/22 15:02] – created | en:services:application_services:high_performance_computing:software:hail [2021/12/06 11:40] (current) – | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Hail ====== | ||
+ | ===== Introduction ===== | ||
+ | //Hail is an open-source, | ||
+ | The HPC system runs version '' | ||
+ | ===== Preparing a Spark Cluster ===== | ||
+ | Hail runs on top of an [[https:// | ||
+ | ==== Environment Variables ==== | ||
+ | Start by loading the modules for the '' | ||
+ | < | ||
+ | module load JAVA/ | ||
+ | </ | ||
+ | Spark will attempt to write logs into the global installation directory, which is read-only, so please specify a log directory via the environment variable '' | ||
+ | < | ||
+ | export SPARK_LOG_DIR=$HOME/ | ||
+ | </ | ||
+ | ==== Submitting Spark Applications ==== | ||
+ | <WRAP center round info 60%> | ||
+ | If you're just interested in running Hail, you can safely [[en: | ||
+ | </ | ||
+ | |||
+ | Applications can be submitted almost as described in the [[https:// | ||
+ | < | ||
+ | #!/bin/bash | ||
+ | #SBATCH -p medium | ||
+ | #SBATCH -N 4 | ||
+ | #SBATCH --ntasks-per-node=1 | ||
+ | #SBATCH -t 01:00:00 | ||
+ | |||
+ | lsf-spark-submit.sh $SPARK_ARGS | ||
+ | </ | ||
+ | where '' | ||
+ | |||
+ | ==== Interactive Sessions ==== | ||
+ | A Spark cluster to be used with Scala from the [[https:// | ||
+ | < | ||
+ | srun -p int -N 4 --ntasks-per-node=20 -t 01:00:00 lsf-spark-shell.sh | ||
+ | </ | ||
+ | ===== Running Hail ===== | ||
+ | The Hail user interface requires at least '' | ||
+ | < | ||
+ | module load python/ | ||
+ | </ | ||
+ | Currently the following python packages are loaded by '' | ||
+ | < | ||
+ | Package | ||
+ | --------------- ------- | ||
+ | bokeh | ||
+ | Jinja2 | ||
+ | MarkupSafe | ||
+ | numpy | ||
+ | packaging | ||
+ | pandas | ||
+ | parsimonious | ||
+ | pip | ||
+ | pyparsing | ||
+ | pyspark | ||
+ | python-dateutil 2.7.3 | ||
+ | pytz 2018.5 | ||
+ | PyYAML | ||
+ | scipy | ||
+ | setuptools | ||
+ | six | ||
+ | tornado | ||
+ | wheel | ||
+ | </ | ||
+ | <WRAP center round help 60%> | ||
+ | Do you need additional Python packages for your Hail workflow that might also be of interest to other users? In that case, please create an [[mailto: | ||
+ | </ | ||
+ | |||
+ | An LSF job running the '' | ||
+ | < | ||
+ | srun -p int -N 4 --ntasks-per-node=20 -t 01:00:00 lsf-pyspark-hail.sh | ||
+ | </ | ||
+ | Once the console is running, initialize hail with the global Spark context '' | ||
+ | < | ||
+ | import hail as hl | ||
+ | hl.init(sc) | ||
+ | </ | ||
+ | |||
+ | --- // |