Running Matlab Parallel Server at the SSCC

Matlab Parallel Server goes a step beyond the Matlab Parallel Toolbox by letting you run Matlab jobs that use multiple servers. At the SSCC, Matlab Parallel Server runs workers on the SSCC’s Slurm cluster. That means you can easily put several hundred cores to work on your Matlab job, and on a quiet day a thousand or more.

The syntax to use Matlab Parallel Server is often identical to Matlab Parallel Toolbox, including the parpool() function and parfor loops. Note that you’ll run your job on Linstat, not Slurm: Matlab Parallel Server will take care of submitting the workers to the Slurm cluster as an array of jobs. The information needed to do so is stored in a cluster configuration. The cluster configuration includes settings that sometimes change between jobs, so you may need to create multiple configurations.

Note

SSCC Staff have worked out how to run jobs using Matlab Parallel Server, but we don’t actually know Matlab and won’t be able to help you write code that uses it. If you’re not sure whether your job isn’t working because of a problem with the code or a problem with how you’re running it, feel free to ask and we’ll help as best we can.

Importing a Cluster Configuration

We have provided a starting cluster configuration in a file called 2gb_short.mlsettings. It should work well for many jobs, but it is limited to 2GB of memory per worker and a maximum job length of six hours. We’ll describe how to change that shortly.

The easy way to import this configuration into Matlab is to copy the following code (note that a copy button will appear in the upper right of the code block if you put your mouse over it), paste it into the Matlab editor, and run it. You can also run it as a script from the Linux command line.

% Import the '2gb short' Slurm cluster profile
%     download
filename = 'SSCC_Default.mlsettings'
websave(filename, 'https://ssc.wisc.edu/pubs/mps/2gb_short.mlsettings')
%     import
sscc_cluster = parallel.importProfile(filename)
newCluster = parcluster(sscc_cluster)
Tip

Open OnDemand is a very easy way to log into an SSCC Linux server and get the Matlab graphical user interface. However, you can’t run Matlab Parallel Server jobs on the OnDemand servers. You can log into Linstat from an OnDemand session by starting a terminal and typing ssh -Y linstat. Then you can run Matlab Parallel Server jobs from that terminal.

Alternatively, you can download 2gb_short.mlsettings, put it somewhere in your Linux home directory (mapped as Z: on SSCC Windows computers), and import it in Matlab.

To do so, click the HOME tab at the very top, then find the group of buttons with the label ENVIRONMENT and click on Parallel. Click Create and Manage Clusters… and Import, then locate and open the configuration file you downloaded.

Finding "Parallel"

Running a Parallel Job

At this point you’ll have two entries under Cluster Profile on the left. Processes (Default) uses Matlab Parallel Toolbox to run workers on the server you’re on. 2gb short uses Matlab Parallel Server to run workers on the SSCC Slurm cluster. You specify which you want to use when you run parpool(). For example,

parpool('Processes', 16);

creates a pool of 16 workers on the server you’re on, while

parpool('2gb short', 256);

creates a pool of 256 workers on the SSCC Slurm cluster. (Starting a parallel pool is a slow process, so be patient.) The syntax for parfor is the same either way.

As an example, try downloading and running primenumbers.m. This simple program demonstrates how to use the different kinds of parallel pools, and compares their performance. It will also ensure you’ve got the cluster configuration installed properly. It will take several minutes to run.

You can run Matlab Parallel Server jobs using Matlab’s graphical user interface, but unless your job is very short you almost certainly want to run it in the background so it will keep going even if you log out or get disconnected. You can do that (like any other Matlab script) with:

matlab -nodisplay < my_script.m > my_script.log &

Some observations from our testing:

  • If you try to use more workers than the Slurm cluster can provide, your program will (eventually) crash, saying the parallel pool failed to start. The job will not wait in the Slurm queue until additional resources become available. Check what resources are available at the time you want to start your job using Slurm Status.
  • If any of your workers are preempted by higher priority jobs, the entire process will need to restart. Researchers in Economics will probably want to use the econ-grad or econ-fac partition for longer jobs.

Creating a New Cluster Configuration

The 2gb short configuration contains two settings you may to need to change, depending on your job:

First, the amount of memory per core is set to 2GB. You can increase this if your job needs more memory. Note that most of the servers in the SSCC Slurm cluster have 128 cores and 256GB of memory, so if you do increase the amount of memory per core then your job will not be able to use all the cores in these servers. (A few servers have 4GB per core; more have 8GB per core. See the Cluster Specifications for details. The high memory servers tend to be in high demand.)

Second, workers are submitted to the Slurm partition short, which limits them to running for six hours but lets them use the servers reserved for short jobs. sscc is the default partition, with a maximum job length of 10 days. econ, econ-grad, and econ-fac are partitions that give users in the Department of Economics priority on the servers that the department helped pay for. For details see Partitions and Priorities.

Matlab saves these settings as a line of standard Slurm arguments that are passed in when submitting jobs to Slurm. Currently the line is:

--mem-per-cpu=2G --partition=short --ntasks=^N^ --cpus-per-task=^T^

If you wanted to use 4GB per core and the econ-grad partition you’d change it to:

--mem-per-cpu=4G --partition=econ-grad --ntasks=^N^ --cpus-per-task=^T^

You could pass in other Slurm arguments as well. Don’t change --ntasks or --cpus-per-task unless you know what you’re doing–they’re meant to be set elsewhere.

To change these settings, you’ll need to make a new profile. The following script duplicates the 2g short profile, changes the memory per core to 4GB and the partition to sscc and saves it as 4gb sscc.

%%
% Modify '2gb short' 
%      create a cluster object, to be modified
myCluster = parcluster('2gb short')

%      **Enter new settings here**
myCluster.ResourceTemplate = "--mem-per-cpu=4G --partition=sscc --ntasks=^N^ --cpus-per-task=^T^"

%      report the cluster object's new properties
myCluster
%      save this profile for use in future Matlab sessions
saveAsProfile(myCluster, '4gb sscc')

You can also do this by going to the Cluster Profile Manager (where you imported 2g short in the first place), selecting 2g short, then clicking Duplicate and Edit. The settings you want are in the ADDITIONAL SLURM PROPERTIES section, Resource arguments box. To give the configuration a new name, right-click on its name on the left.

You may notice the option to validate your configuration. This can be extremely helpful for distinguishing between problems with your configuration and problems with your code. However, in our testing we found you must specify at least 7GB of memory per core for it to validate properly, even though programs will run successfully with less. Also, when it asks for the number of workers to use for validation, use a small number like 4. Validation will fail if the Slurm cluster doesn’t have the required resources available at the time you run it.

Note

There are more settings you can change, such as the number of threads (cores) per worker. If you find it useful to change these settings, or have any other insights about using Matlab Parallel Server at the SSCC, we’d love to hear from you and then share your findings with other SSCC researchers.