Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
en:services:application_services:high_performance_computing:workshops:matlab-on-scc-2023 [2023/01/30 20:32] – created vend | en:services:application_services:high_performance_computing:workshops:matlab-on-scc-2023 [2023/02/27 10:44] (current) – moved to Software Section vend | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Getting Started with Parallel Computing using MATLAB on the SCC HPC Cluster ====== | ||
- | This document provides the steps to configure MATLAB to submit jobs to a cluster, retrieve results, and debug errors. | ||
- | |||
- | ===== CONFIGURATION – MATLAB client on the cluster ===== | ||
- | After logging into the cluster, configure MATLAB to run parallel jobs on your cluster by calling the shell script configCluster.sh | ||
- | $ module load matlab | ||
- | $ configCluster.sh | ||
- | Jobs will now default to the cluster rather than submit to the local machine. | ||
- | INSTALLATION and CONFIGURATION – MATLAB client on the desktop | ||
- | The SCC MATLAB support package can be found at TBD. | ||
- | Download the appropriate archive file and start MATLAB. | ||
- | >> userpath | ||
- | Configure MATLAB to run parallel jobs on your cluster by calling configCluster. | ||
- | >> configCluster | ||
- | Submission to the remote cluster requires SSH credentials. | ||
- | Jobs will now default to the cluster rather than submit to the local machine. | ||
- | NOTE: If you would like to submit to the local machine then run the following command: | ||
- | >> % Get a handle to the local resources | ||
- | >> c = parcluster(' | ||
- | CONFIGURING JOBS | ||
- | Prior to submitting the job, we can specify various parameters to pass to our jobs, such as queue, e-mail, walltime, etc. The following is a partial list of parameters. | ||
- | >> % Get a handle to the cluster | ||
- | >> c = parcluster; | ||
- | |||
- | >> % Specify the account to use | ||
- | >> c.AdditionalProperties.AccountName = ' | ||
- | |||
- | >> % Request feature/ | ||
- | >> c.AdditionalProperties.Constraint = ' | ||
- | |||
- | >> % Request email notification of job status | ||
- | >> c.AdditionalProperties.EmailAddress = ' | ||
- | |||
- | >> % Specify number of GPUs to use | ||
- | >> c.AdditionalProperties.GpusPerNode = 1; | ||
- | >> c.AdditionalProperties.GpuCard = ' | ||
- | |||
- | >> % Specify memory to use, per core (default: 4gb) | ||
- | >> c.AdditionalProperties.MemUsage = ' | ||
- | |||
- | >> % Specify the queue to use | ||
- | >> c.AdditionalProperties.QueueName = ' | ||
- | |||
- | >> % Request entire node(s) (default: false) | ||
- | >> c.AdditionalProperties.RequireExclusiveNode = true; | ||
- | |||
- | >> % Specify the wall time (e.g., 5 hours) | ||
- | >> c.AdditionalProperties.WallTime = ' | ||
- | |||
- | Save changes after modifying AdditionalProperties for the above changes to persist between MATLAB sessions. | ||
- | >> c.saveProfile | ||
- | |||
- | To see the values of the current configuration options, display AdditionalProperties. | ||
- | |||
- | >> % To view current properties | ||
- | >> c.AdditionalProperties | ||
- | |||
- | Unset a value when no longer needed. | ||
- | >> % Turn off email notifications | ||
- | >> c.AdditionalProperties.EmailAddress = ''; | ||
- | >> c.saveProfile | ||
- | INTERACTIVE JOBS - MATLAB client on the cluster | ||
- | To run an interactive pool job on the cluster, continue to use parpool as you’ve done before. | ||
- | >> % Get a handle to the cluster | ||
- | >> c = parcluster; | ||
- | |||
- | >> % Open a pool of 64 workers on the cluster | ||
- | >> pool = c.parpool(64); | ||
- | |||
- | Rather than running local on the local machine, the pool can now run across multiple nodes on the cluster. | ||
- | |||
- | >> % Run a parfor over 1000 iterations | ||
- | >> parfor idx = 1:1000 | ||
- | a(idx) = … | ||
- | end | ||
- | |||
- | Once we’re done with the pool, delete it. | ||
- | |||
- | >> % Delete the pool | ||
- | >> pool.delete | ||
- | INDEPENDENT BATCH JOB | ||
- | Use the batch command to submit asynchronous jobs to the cluster. | ||
- | >> % Get a handle to the cluster | ||
- | >> c = parcluster; | ||
- | |||
- | >> % Submit job to query where MATLAB is running on the cluster | ||
- | >> job = c.batch(@pwd, | ||
- | |||
- | >> % Query job for state | ||
- | >> job.State | ||
- | |||
- | >> % If state is finished, fetch the results | ||
- | >> job.fetchOutputs{: | ||
- | |||
- | >> % Delete the job after results are no longer needed | ||
- | >> job.delete | ||
- | |||
- | To retrieve a list of currently running or completed jobs, call parcluster to retrieve the cluster object. | ||
- | >> c = parcluster; | ||
- | >> jobs = c.Jobs; | ||
- | Once we’ve identified the job we want, we can retrieve the results as we’ve done previously. | ||
- | fetchOutputs is used to retrieve function output arguments; if calling batch with a script, use load instead. | ||
- | To view results of a previously completed job: | ||
- | >> % Get a handle to the job with ID 2 | ||
- | >> job2 = c.Jobs(2); | ||
- | |||
- | NOTE: You can view a list of your jobs, as well as their IDs, using the above c.Jobs command. | ||
- | >> % Fetch results for job with ID 2 | ||
- | >> job2.fetchOutputs{: | ||
- | PARALLEL BATCH JOB | ||
- | Users can also submit parallel workflows with the batch command. | ||
- | function [t, A] = parallel_example(iter) | ||
- | |||
- | if nargin==0 | ||
- | iter = 8; | ||
- | end | ||
- | |||
- | disp(' | ||
- | |||
- | t0 = tic; | ||
- | parfor idx = 1:iter | ||
- | A(idx) = idx; | ||
- | pause(2) | ||
- | idx | ||
- | end | ||
- | t = toc(t0); | ||
- | |||
- | disp(' | ||
- | |||
- | save RESULTS A | ||
- | |||
- | end | ||
- | |||
- | This time when we use the batch command, to run a parallel job, we’ll also specify a MATLAB Pool. | ||
- | >> % Get a handle to the cluster | ||
- | >> c = parcluster; | ||
- | |||
- | >> % Submit a batch pool job using 4 workers for 16 simulations | ||
- | >> job = c.batch(@parallel_example, | ||
- | ' | ||
- | |||
- | >> % View current job status | ||
- | >> job.State | ||
- | |||
- | >> % Fetch the results after a finished state is retrieved | ||
- | >> job.fetchOutputs{: | ||
- | ans = | ||
- | 8.8872 | ||
- | The job ran in 8.89 seconds using four workers. | ||
- | We’ll run the same simulation but increase the Pool size. This time, to retrieve the results later, we’ll keep track of the job ID. | ||
- | NOTE: For some applications, | ||
- | >> % Get a handle to the cluster | ||
- | >> c = parcluster; | ||
- | |||
- | >> % Submit a batch pool job using 8 workers for 16 simulations | ||
- | >> job = c.batch(@parallel_example, | ||
- | ' | ||
- | |||
- | >> % Get the job ID | ||
- | >> id = job.ID | ||
- | id = | ||
- | 4 | ||
- | >> % Clear job from workspace (as though we quit MATLAB) | ||
- | >> clear job | ||
- | Once we have a handle to the cluster, we’ll call the findJob method to search for the job with the specified job ID. | ||
- | >> % Get a handle to the cluster | ||
- | >> c = parcluster; | ||
- | |||
- | |||
- | >> % Find the old job | ||
- | >> job = c.findJob(' | ||
- | |||
- | >> % Retrieve the state of the job | ||
- | >> job.State | ||
- | ans = | ||
- | finished | ||
- | >> % Fetch the results | ||
- | >> job.fetchOutputs{: | ||
- | ans = | ||
- | 4.7270 | ||
- | The job now runs in 4.73 seconds using eight workers. | ||
- | Alternatively, | ||
- | |||
- | DEBUGGING | ||
- | If a serial job produces an error, call the getDebugLog method to view the error log file. When submitting independent jobs, with multiple tasks, specify the task number. | ||
- | >> c.getDebugLog(job.Tasks(3)) | ||
- | For Pool jobs, only specify the job object. | ||
- | >> c.getDebugLog(job) | ||
- | When troubleshooting a job, the cluster admin may request the scheduler ID of the job. This can be derived by calling schedID | ||
- | >> schedID(job) | ||
- | ans = | ||
- | 25539 | ||
- | |||
- | |||
- | |||
- | HELPER FUNCTIONS | ||
- | Function | ||
- | Description | ||
- | Desktop-only | ||
- | clusterFeatures | ||
- | List of scheduler features/ | ||
- | |||
- | clusterGpuCards | ||
- | List of cluster GPU cards | ||
- | |||
- | clusterQueueNames | ||
- | List of scheduler queue names | ||
- | |||
- | disableArchiving | ||
- | Modify file archiving to resolve file mirroring issue | ||
- | true | ||
- | fixConnection | ||
- | Reestablish cluster connection | ||
- | true | ||
- | willRun | ||
- | Explain why job is not running | ||
- | |||
- | TO LEARN MORE | ||
- | To learn more about the MATLAB Parallel Computing Toolbox, check out these resources: | ||
- | • Parallel Computing Coding Examples | ||
- | • Parallel Computing Documentation | ||
- | • Parallel Computing Overview | ||
- | • Parallel Computing Tutorials | ||
- | • Parallel Computing Videos | ||
- | • Parallel Computing Webinars |