Go

Find A Location

Search

Center for High Performance Computing (CHPC)

 

MATLAB Resources

Getting Started

Requirements:

  • A working installation of MATLAB on your workstation (you will be submitting jobs from your local system to run on our cluster).
  • Your MATLAB must include the "Parallel Computing Toolbox" before you can submit jobs to our cluster.  You can use the 'ver' command to see which toolboxes you have installed.
  • You'll need an account on our cluster.

Starting with MATLAB R2012a, Mathworks has changed the way you interface with the cluster.  I'm going to include separate instructions for >=R2012a and <=R2011b.

>=R2012a

First you'll need to configure MATLAB to interface with our cluster.  With previous versions of MATLAB you were able to edit an existing configuration file to point to the latest version of MATLAB.  Starting with R2012a, they've changed the way the configuration files are stored so you'll need to start with a fresh version.  I've included a sample template to get you started, but it will need to be customized.  You can download the configuration template here (this is for R2012a).  You can upload this file into MATLAB by selecting Parallel->Import Cluster Profile.  If you select Parallel->Manange Cluster Profile, you should see it listed as CHPC2012a with the description "Radiology Cluster". 

You're now ready to start customizing this configuration.  You'll find an edit button in the bottom-right of the window.  You'll need to change the following:

  • JobStorageLocation - You'll want to set this to some directory on your local workstation like /home/USERNAME/MATLAB.  If this directory doesn't exist you'll need to create it.  MATLAB will use this directory to stage job data before copying over to the login nodes.
  • NumWorkers - You may want to change this eventually, but for now leave it at 16.
  • independentSubmitFcn - Change this to some subdirectory under your account on the login nodes.  MATLAB will store job data under this directory after copying it off of your workstation.  NOTE: If you're running OSX, check whether the function is definied with a leading lower-case letter.  If not, you may need to manually change this!
  • communicatingSubmitFcn - Same as the above.

Next, you'll need to copy over the MATLAB scripts for interfacing with the queuing system.  Example scripts are provided, which you can either copy over from your local MATLAB installation (MATLABROOT\toolbox\distcomp\examples\integration\pbs\nonshared\ under windows) or on our login nodes from (/export/matlab/R2012a/toolbox/distcomp/examples/integration/pbs/nonshared/) you'll copy them over to your local toolbox (e.g MATLABROOT\toolbox\local under Windows, MATLABROOT/toolbox/local under unix).

Besides the MATLAB provided files, there are three customized files that you'll need to dowload to your local toolbox.  You'll need to overwrite the default files you copied in from the previous step:

communicatingJobWrapper.sh

communicatingSubmitFcn.m

independentSubmitFcn.m

The communicatingJobWrapper.sh will allow your parallel jobs to use an MPI optimized for our cluster, while the MATLAB functions are customized to integrate with our queuing system and will allow you to pass variables from your MATLAB jobs to request job-specific resources (e.g. memory, walltime, node type).

You should now be ready to test things.  Click the "Validate" icon on the toolbar.  It should prompt you for your username, whether you want to use an identify file (select no), and then your password.  If you're logged into the cluster, you can run 'qstat' and you should see jobs being created.  If everything is working correctly, you should see it pass the first four tests ('Cluster connection test', 'Job test', 'SPMD job test', and 'Pool job test').  It will fail the fifth test 'MATLAB pool test', but this can be ignored.  If any of the other tests fail, you'll probably want to re-run them to make sure it's reproducible, and also click on 'Show Details' in case it provides any interesting debugging information.

<=R2011b

First, you'll need to configure MATLAB to interface with our cluster.  I've included a sample template to get you started, but it will need to be customized.  You can download the file here (this is for R2011b).  You can upload this file to MATLAB by first selecting Parallel->Manage Configurations, then File->Import.  The configuration should now appear in your 'Configuration Manager' as CHPC, with the description 'Radiology Cluster'.  Select it then click File->Properties.  You'll need to edit a few of these properties:

"Root folder of MATLAB installation for workers" - If you're running anything other than R2010b, you'll need to point to the same version on the login node (e.g. /export/matlab/R2011a)

"Folder where job data is stored" - This is the directory on your local system where you'll store files.  On a UNIX system it might look like "/home/USERNAME/MATLAB" (you'll probably need to create this directory) while on a Windows machine it might look like "C:\Users\USERNAME\Documents\MATLAB\Dist Computing Scripts\Job Data".  You'll want to replace USERNAME with your real username on the system.

"Function called when submitting * jobs" - This is the directory on the remote system (i.e. one of the login nodes).  A good choice woud be "/home/USERNAME/MATLAB".  Again, you'll need to create this directory first, and replace USERNAME with your real username on the system.

Next, you'll need to copy over the MATLAB scripts for interfacing with the queuing system.  Example scripts are provided, which you can either copy over from your local MATLAB installation (MATLABROOT\toolbox\distcomp\examples\integration\pbs\nonshared\ under windows) or on our login nodes from (/export/matlab/R2010b/toolbox/distcomp/examples/integration/pbs/nonshared/) you'll copy them over to your local toolbox (e.g MATLABROOT\toolbox\local under Windows, MATLABROOT/toolbox/local under unix).

Besides the MATLAB provided files, there are four custom files that you'll need to dowload to your local toolbox:

ParallelIntelMPIWrapper.sh

mpiLibConf.m

The mpiLibConf.m funciton above may cause problems when running on your local system.  You may wish to add some logic to point to the local MPI library when not using the cluster (Thanks to Charlie Gaona for this fix):

if strcmp( getenv( 'MDCE_DECODE_FUNCTION' ), 'parallel.cluster.generic.parallelDecodeFcn' )
    % CHPC Option
    extras = {};
    primaryLib = [ '/export/intel/Compiler/12.0/impi/4.0.1.007' '/lib64/' 'libmpich.so' ];
else
    % Local scheduler - don't use the MPD build
    warning( 'Local scheduler: about to use default installed MPICH2 build' );
    extras = {};
    primaryLib = 'mpich2ssm.dll';
end

distributedSubmitFcn.m

parallelSubmitIntelMPIFcn.m

These last 2 files have been customized to: a) Specify the correct number of cores/node b) Specify a walltime of 4 hours by default and c) Requests the appropriatenumber of licenses so your job wont run unless their are sufficient licesnses available.  If you need to run longer jobs, or jobs that need more processors or memory, you can set these per-job by using CHPC_* global variables.  Please see the examples for how to do this, or contact me if you have questions.

You should now be ready to test the configuration.   You should restart MATLAB to make sure it's using the latest copies of your files, otherwise it might not prompt your for your username/password.  Make sure CHPC is selected in the 'Configuration Manager' and then click the 'Start Validation' button in the lower-right corner.  If you are logged into the cluster, you should see batch jobs start up (use the 'qstat' command to list jobs).  Be aware that these tests will take a few minutes, and the last test, Matlabpool, will fail.  This is a known problem, but shouldn't cause problems running your jobs.  Also, you may experience time-outs and need to increase the time for testing by unchecking "Use Default" for "Max Time Per Stage", and set it to 1200 seconds.

If the 3 other tests pass, you're now ready to start running parallel batch jobs. 

You'll probably want to test out your scripts by running small jobs locally, then scale up the problem and run the larger problems on the cluster.  See the Presentation and File sections for examples on parallel programming.

Potential Pitfalls:

  • If you are running Windows, you might find that your firewall is blocking access to the cluster.  You can allow your Windows firewall to accept MATLBAB connections by running the following as a privileged user:
    MATLABROOT\toolbox\distcomp\bin\addMatlabToWindowsFirewall.bat
    where MATLABROOT is the location that you installed MATLAB under.
  • If you are using MATLAB pre-2010b, you'll need to configure your workstation to be able to communicate to the login nodes via SSH without being prompted for a passphrase/password.  If you are using >=2010b, this is not necessary.

If you have questions or problems, please let us know.

Example MATLAB scripts

These example scripts are designed to be used as templates for the users to build their own wrappers around their existing scripts to submit to the cluster.  The example consists of the MATALAB script paramSweep.m (which depends on the script odesystem.m).  I have two methods for submitting the jobs: batch_ode_wait.m submits the job, waits for it to finish and then retrieves any results.  batch_ode.m submits the job and 'tags' it with a unique name.  Once the job has been submitted to the cluster, the script has finished.  You need to use the corresponding script batch_return.m to retrieve the results.  This requires entering the same unique tag to identify the job.  The benefits to the former method are that you can capture the diary information and it allows a clearer upload of the results, while the latter method allows you to submit multiple jobs.

These scripts are under development so feedback from the users is always welcome.

MATLAB_examples.zip

It's pretty common for things to go wrong with MATLAB jobs, especially when you're just getting started.  Some useful diagnostics can be found on the login nodes within the job directory (e.g. /home/USERNAME/MATLAB/JobXX).  The first thing to check is the main log JobXX.mpiexec.log.  If a single worker is dying, you might look at TaskX.java.log.  These files may be deleted when you delete the job or retrieve the results from your workstation, so you may wish to check the output first.

Presentations

You can view the slides from our  "Hands on Parallel Computing with MATLAB" seminar held in August, 2010:

We also have Demo files from this presentation:

ZIP DATA Files

ZIP TASK Files

In October, 2011 we had workshops on using the Parallel Toolkit "Parallel Computing with MATLAB" and Serial Optimization "MATLAB Under the Hood".  We have the slides from the workshops: PCT_Masterclass.pdf, MATLAB_UnderTheHood.pdf as well as the example files: PCT_Seminar.zip  MATLAB_UnderTheHood.zip.

NOTE: These are only available on the Wash U network.

Helpful Links

MATLAB Central:  http://www.mathworks.com/matlabcentral/

NEW! Online student tutorials:  http://www.mathworks.com/academia/student_center/tutorials/index.html?link=body

Training:  http://www.mathworks.com/services/training/index.html

Additional Webinars available now:  http://www.mathworks.com/company/events/webinars/?s_cid=HP_E_RWNext, you'll need to copy over the MATLAB scripts for interfacing with the queuing system.  Example scripts are provided, which you can either copy over from your local MATLAB installation (MATLABROOT\toolbox\distcomp\examples\integration\pbs\nonshared\ under windows) or on our login nodes from (/export/matlab/R2010b/toolbox/distcomp/examples/integration/pbs/nonshared/) you'll copy them over to your local toolbox (e.g MATLABROOT\toolbox\local under Windows, MATLABROOT/toolbox/local under unix).