Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:services:application_services:high_performance_computing:workshops:matlab-on-scc-2023 [2023/01/30 20:59] – [TO LEARN MORE] vend | en:services:application_services:high_performance_computing:workshops:matlab-on-scc-2023 [2023/02/27 10:44] (current) – moved to Software Section vend | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Getting Started with Parallel Computing using MATLAB on the SCC HPC Cluster ====== | ||
- | This document provides the steps to configure MATLAB to submit jobs to a cluster, retrieve results, and debug errors. | ||
- | |||
- | ===== CONFIGURATION – MATLAB client on the cluster ===== | ||
- | |||
- | After logging into the cluster, configure MATLAB to run parallel jobs on your cluster by calling the shell script '' | ||
- | < | ||
- | $ module load matlab | ||
- | $ configCluster.sh | ||
- | </ | ||
- | |||
- | Jobs will now default to the cluster rather than submit to the local machine. | ||
- | |||
- | ==== INSTALLATION and CONFIGURATION – MATLAB client on the desktop ===== | ||
- | |||
- | The SCC MATLAB support package can be found at {{ : | ||
- | Download the appropriate archive file and start MATLAB. | ||
- | < | ||
- | >> userpath | ||
- | </ | ||
- | |||
- | Configure MATLAB to run parallel jobs on your cluster by calling configCluster. | ||
- | |||
- | < | ||
- | >> configCluster | ||
- | </ | ||
- | |||
- | Submission to the remote cluster requires SSH credentials. | ||
- | |||
- | Jobs will now default to the cluster rather than submit to the local machine. | ||
- | |||
- | **NOTE:** If you would like to submit to the local machine then run the following command: | ||
- | |||
- | < | ||
- | >> % Get a handle to the local resources | ||
- | >> c = parcluster(' | ||
- | </ | ||
- | |||
- | ===== CONFIGURING JOBS ==== | ||
- | Prior to submitting the job, we can specify various parameters to pass to our jobs, such as queue, e-mail, walltime, etc. The following is a partial list of parameters. | ||
- | |||
- | < | ||
- | >> % Get a handle to the cluster | ||
- | >> c = parcluster; | ||
- | |||
- | >> % Specify the account to use | ||
- | >> c.AdditionalProperties.AccountName = ' | ||
- | |||
- | >> % Request feature/ | ||
- | >> c.AdditionalProperties.Constraint = ' | ||
- | |||
- | >> % Request email notification of job status | ||
- | >> c.AdditionalProperties.EmailAddress = ' | ||
- | |||
- | >> % Specify number of GPUs to use | ||
- | >> c.AdditionalProperties.GpusPerNode = 1; | ||
- | >> c.AdditionalProperties.GpuCard = ' | ||
- | |||
- | >> % Specify memory to use, per core (default: 4gb) | ||
- | >> c.AdditionalProperties.MemUsage = ' | ||
- | |||
- | >> % Specify the queue to use | ||
- | >> c.AdditionalProperties.QueueName = ' | ||
- | |||
- | >> % Request entire node(s) (default: false) | ||
- | >> c.AdditionalProperties.RequireExclusiveNode = true; | ||
- | |||
- | >> % Specify the wall time (e.g., 5 hours) | ||
- | >> c.AdditionalProperties.WallTime = ' | ||
- | </ | ||
- | |||
- | Save changes after modifying " | ||
- | < | ||
- | >> c.saveProfile | ||
- | </ | ||
- | |||
- | To see the values of the current configuration options, display " | ||
- | |||
- | < | ||
- | >> % To view current properties | ||
- | >> c.AdditionalProperties | ||
- | </ | ||
- | |||
- | Unset a value when no longer needed. | ||
- | |||
- | < | ||
- | >> % Turn off email notifications | ||
- | >> c.AdditionalProperties.EmailAddress = ''; | ||
- | >> c.saveProfile | ||
- | </ | ||
- | |||
- | ===== INTERACTIVE JOBS - MATLAB client on the cluster ===== | ||
- | |||
- | To run an interactive pool job on the cluster, continue to use " | ||
- | |||
- | < | ||
- | >> % Get a handle to the cluster | ||
- | >> c = parcluster; | ||
- | |||
- | >> % Open a pool of 64 workers on the cluster | ||
- | >> pool = c.parpool(64); | ||
- | </ | ||
- | |||
- | Rather than running local on the local machine, the pool can now run across multiple nodes on the cluster. | ||
- | |||
- | < | ||
- | >> % Run a parfor over 1000 iterations | ||
- | >> parfor idx = 1:1000 | ||
- | a(idx) = … | ||
- | end | ||
- | </ | ||
- | |||
- | Once we’re done with the pool, delete it. | ||
- | |||
- | < | ||
- | >> % Delete the pool | ||
- | >> pool.delete | ||
- | </ | ||
- | |||
- | ===== INDEPENDENT BATCH JOB ===== | ||
- | |||
- | Use the batch command to submit asynchronous jobs to the cluster. | ||
- | |||
- | < | ||
- | >> % Get a handle to the cluster | ||
- | >> c = parcluster; | ||
- | |||
- | >> % Submit job to query where MATLAB is running on the cluster | ||
- | >> job = c.batch(@pwd, | ||
- | |||
- | >> % Query job for state | ||
- | >> job.State | ||
- | |||
- | >> % If state is finished, fetch the results | ||
- | >> job.fetchOutputs{: | ||
- | |||
- | >> % Delete the job after results are no longer needed | ||
- | >> job.delete | ||
- | </ | ||
- | |||
- | To retrieve a list of currently running or completed jobs, call " | ||
- | |||
- | < | ||
- | >> c = parcluster; | ||
- | >> jobs = c.Jobs; | ||
- | </ | ||
- | |||
- | Once we’ve identified the job we want, we can retrieve the results as we’ve done previously. | ||
- | " | ||
- | To view results of a previously completed job: | ||
- | |||
- | < | ||
- | >> % Get a handle to the job with ID 2 | ||
- | >> job2 = c.Jobs(2); | ||
- | </ | ||
- | |||
- | **NOTE:** You can view a list of your jobs, as well as their IDs, using the above c.Jobs command. | ||
- | < | ||
- | >> % Fetch results for job with ID 2 | ||
- | >> job2.fetchOutputs{: | ||
- | </ | ||
- | |||
- | ===== PARALLEL BATCH JOB ===== | ||
- | |||
- | Users can also submit parallel workflows with the batch command. | ||
- | |||
- | < | ||
- | function [t, A] = parallel_example(iter) | ||
- | |||
- | if nargin==0 | ||
- | iter = 8; | ||
- | end | ||
- | |||
- | disp(' | ||
- | |||
- | t0 = tic; | ||
- | parfor idx = 1:iter | ||
- | A(idx) = idx; | ||
- | pause(2) | ||
- | idx | ||
- | end | ||
- | t = toc(t0); | ||
- | |||
- | disp(' | ||
- | |||
- | save RESULTS A | ||
- | |||
- | end | ||
- | </ | ||
- | |||
- | This time when we use the batch command, to run a parallel job, we’ll also specify a MATLAB Pool. | ||
- | |||
- | < | ||
- | >> % Get a handle to the cluster | ||
- | >> c = parcluster; | ||
- | |||
- | >> % Submit a batch pool job using 4 workers for 16 simulations | ||
- | >> job = c.batch(@parallel_example, | ||
- | ' | ||
- | |||
- | >> % View current job status | ||
- | >> job.State | ||
- | |||
- | >> % Fetch the results after a finished state is retrieved | ||
- | >> job.fetchOutputs{: | ||
- | ans = | ||
- | 8.8872 | ||
- | </ | ||
- | |||
- | The job ran in 8.89 seconds using four workers. | ||
- | |||
- | We’ll run the same simulation but increase the Pool size. This time, to retrieve the results later, we’ll keep track of the job ID. | ||
- | |||
- | **NOTE:** For some applications, | ||
- | |||
- | < | ||
- | >> % Get a handle to the cluster | ||
- | >> c = parcluster; | ||
- | |||
- | >> % Submit a batch pool job using 8 workers for 16 simulations | ||
- | >> job = c.batch(@parallel_example, | ||
- | ' | ||
- | |||
- | >> % Get the job ID | ||
- | >> id = job.ID | ||
- | id = | ||
- | 4 | ||
- | >> % Clear job from workspace (as though we quit MATLAB) | ||
- | >> clear job | ||
- | </ | ||
- | |||
- | Once we have a handle to the cluster, we’ll call the " | ||
- | |||
- | < | ||
- | >> % Get a handle to the cluster | ||
- | >> c = parcluster; | ||
- | |||
- | |||
- | >> % Find the old job | ||
- | >> job = c.findJob(' | ||
- | |||
- | >> % Retrieve the state of the job | ||
- | >> job.State | ||
- | ans = | ||
- | finished | ||
- | >> % Fetch the results | ||
- | >> job.fetchOutputs{: | ||
- | ans = | ||
- | 4.7270 | ||
- | </ | ||
- | |||
- | The job now runs in 4.73 seconds using eight workers. | ||
- | Alternatively, | ||
- | |||
- | {{ : | ||
- | |||
- | ===== DEBUGGING ===== | ||
- | |||
- | If a serial job produces an error, call the " | ||
- | |||
- | < | ||
- | >> c.getDebugLog(job.Tasks(3)) | ||
- | </ | ||
- | |||
- | For Pool jobs, only specify the job object. | ||
- | < | ||
- | >> c.getDebugLog(job) | ||
- | </ | ||
- | When troubleshooting a job, the cluster admin may request the scheduler ID of the job. This can be derived by calling schedID | ||
- | < | ||
- | >> schedID(job) | ||
- | ans = | ||
- | 25539 | ||
- | </ | ||
- | |||
- | ===== HELPER FUNCTIONS ===== | ||
- | ^ Function | ||
- | | clusterFeatures | ||
- | | clusterGpuCards | ||
- | | clusterQueueNames | ||
- | | disableArchiving | ||
- | | fixConnection | ||
- | | willRun | ||
- | |||
- | ===== TO LEARN MORE ===== | ||
- | |||
- | To learn more about the MATLAB Parallel Computing Toolbox, check out these resources: | ||
- | | ||
- | * Parallel Computing Coding Examples | ||
- | * Parallel Computing Documentation | ||
- | * Parallel Computing Overview | ||
- | * Parallel Computing Tutorials | ||
- | * Parallel Computing Videos | ||
- | * Parallel Computing Webinars |