Differences

This shows you the differences between two versions of the page.

--- en:services:application_services:high_performance_computing:software:spark [2023/06/03 09:21] – [Creating a Spark Cluster on the SCC] removed gwdu103 vend
+++ en:services:application_services:high_performance_computing:software:spark [2023/06/03 09:42] (current) – [Creating a Spark Cluster on the SCC] vend
@@ Line 18: / Line 18: @@
 </code>
-We’re now ready to deploy a Spark cluster. Since the resources of the HPC system are managed by [[en:services:application_services:high_performance_computing:running_jobs_slurm|Slurm]], the entire setup has to be submitted as a job. This can be conveniently done by running the script ''scc_spark_deploy.sh'', which accepts the same arguments as the sbatch command used to submit generic batch jobs:
+We’re now ready to deploy a Spark cluster. Since the resources of the HPC system are managed by [[en:services:application_services:high_performance_computing:running_jobs_slurm|Slurm]], the entire setup has to be submitted as a job. This can be conveniently done by running the script ''scc_spark_deploy.sh'', which accepts the same arguments as the sbatch command used to submit generic batch jobs. The default setup is:
+<code>
+#SBATCH --partition fat
+#SBATCH --time=0-02:00:00
+#SBATCH --qos=short
+#SBATCH --nodes=4
+#SBATCH --job-name=Spark
+#SBATCH --output=scc_spark_job-%j.out
+#SBATCH --ntasks-per-node=1
+#SBATCH --cpus-per-task=24
+</code>
+If you would like to override these default values, you can do so, by handing over the Slurm parameters to the script:
 <code>
@@ Line 24: / Line 37: @@
 Submitted batch job 872699
 </code>
+Especially, if you do not want to share the nodes resources, you need to add ''%%--%%exclusive''
 In this case, the ''%%--%%nodes'' parameter has been set to specify a total amount of two worker nodes and ''%%--%%time'' is used to request a job runtime of two hours. If you would like to set a longer  runtime, beside ''%%--%%time'', add ''%%--%%qos=normal'' parameter as well. The job ID is reported back. We can use it to inspect if the job is running yet and if so, on which nodes: