3 Program Information

In this chapter we’ll discuss how to run the most popular programs at the SSCC.

This does not cover everything that’s installed on the servers: see our software database and the additional list of Biomedical Research Software on Silo. If you plan to use other software we presume you know how to run it, but you may want to read the Linux chapter and the Slurm chapter if you will submit jobs to Slurm.

You are welcome to install additional software on the SSCC’s Linux servers if you can install it in your home directory without needing to use sudo. Otherwise, contact the Help Desk for assistance.

Note that JupyterLab has its own section, though it’s used to run other programs interactively.

3.1 R

3.1.1 Running R Interactively

RStudio Server allows you to run R interactively on Linstat or LinSilo (including LinSiloBig) with the same user interface you’re used to in Windows or on a Mac. The user interface runs in a web browser on your computer, making it very responsive, while the computation happens on the server.

To use RStudio Server on Linstat you must be on SSCC’s network. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN. Then open a web browser and go to:

https://ssc.wisc.edu/rstudio-server

To use RStudio Server on LinSilo or LinSiloBig you must first log into Silo. Then go to the programs menu and under LinSilo you’ll find shortcuts for RStudio Server.

You can also run R in JupyterLab.

3.1.2 Running R in Batch Mode

To run an R job on Linstat briefly so you can monitor what resources it requires, go to the Terminal tab in RStudio Server and type:

R CMD BATCH --no-save my_R_script.R &

where my_R_script.R should be replaced by the name of your R script.

3.1.3 Submitting R Jobs to Slurm

To submit an R job to Slurm, go to the Terminal tab in RStudio Server and type:

ssubmit --cores=C --mem=Mg "R CMD BATCH --no-save my_R_script.R"

where C should be replaced by the number of cores your job will use, M should be replaced by the number of gigabytes of memory your job will use, and my_R_script.R should be replaced by the name of your R script.

Slurm Assistant can craft this command for you.

You may also want to read Converting R Loops to Parallel Loops Using Slurm (The Easy Way).

3.1.4 Installing Packages

The SSCC installs a large number of the most popular packages for you. You can install additional packages using the standard install.packages() function. Do so in an interactive session using RStudio Server (or the R command line). You only need to install a package once, so do not include calls to install.packages() in your research scripts. Attempts to install packages in batch mode or in jobs submitted to Slurm will usually fail. Note that if you installed packages on Winstat you’ll need to install them again on the Linux file system, but packages installed using Linstat or LinSilo will be available in Slurm or SlurmSilo respectively.

In Silo, you can only install packages directly from CRAN and Bioconductor. If you need to install packages from other locations (e.g., GitHub), move the package files into Silo with Globus and then install them locally. For detailed instructions, click here.

The SSCC updates R before each semester (in August and January). R uses a different path for libraries with each “major” update. Major updates are when the first or second number in the R version changes, which usually happens in the summer. (Going from 4.3.0 to 4.3.1 is not a major update, while going from 4.3.0 to 4.4.0 or 5.0.0 are major updates.) Therefore, you will need to resintall any packages you want to use after each update to ensure you have a version that is compatible with the current version of R. If you use a lot of additional packages, consider creating a script that installs all of them, that you can rerun after each major update. Try something like:

myPackages <- c("lmerTest", "stargazer", "ggeffects")
install.packages(myPackages)

3.1.5 Computing Resources

Base R uses just one core, though packages exist that allow it to use multiple cores for parallel processing. Sometimes packages will build in parallel processing functionality without making that clear–running a test job and monitoring it will tell you if that’s the case. But you should assume R jobs will only need one core unless you have reason to believe otherwise.

3.1.6 What to Read in Linux Essentials

Read the section on specifying file locations.

3.2 Stata

3.2.1 Running Stata Interactively

To run Stata interactively on Linstat, log into the server and type:

xstata

This will give the same graphical user interace as in Windows or on a Mac. In Open OnDemand, click Applications, Statistics, Stata.

The interface can be somewhat sensitive to network lag. If you find the user interface is slow to respond you may want to write most of your code on Winstat or your own computer and just log in to Linstat to run it. You can also use JupyterLab to do interactive work in Stata.

3.2.2 Running Stata in Batch Mode

To run a Stata job on Linstat briefly so you can monitor what resources it requires, log in and type:

stata -b do my_do_file &

where my_do_file should be replaced by the name of your do file (including .do in the name is optional).

3.2.3 Submitting Stata Jobs to Slurm

To submit a Stata job to Slurm, log in and type:

ssubmit --cores=C --mem=Mg "stata -b do my_do_file"

where C should be replaced by the number of cores your job will use, M should be replaced by the number of gigabytes of memory your job will use, and my_do_file should be replaced by the name of your do file.

Slurm Assistant can craft this command for you.

You may also want to read Converting Stata Loops to Parallel Loops Using Slurm (The Easy Way).

3.2.4 Installing Packages

You can install Stata packages with the standard ssc install or net commands. Note that if you installed packages on Winstat you’ll need to install them again to use them on our Linux servers (including Slurm), but packages installed using Linstat or LinSilo will also be available in Slurm or SlurmSilo respectively.

In Silo, you can only install packages directly from the Statistical Software Components archive (SSC, not related to the SSCC). If you need to install packages from other locations, follow these steps:

Open Stata on Winstat.
Change your package installation location. Choose a location that does not already exist. This will make it easier to find the files we install.

For example, to change it to Z:/silo_ado, run sysdir set PLUS Z:/silo_ado
Install the Stata package(s) you want.

For example, net install *packagename*, from("*url*"), replacing *packagename* with the name of the package and *url* from the site where it is hosted.
Find the files Stata installed.

Assuming you set the PLUS folder to Z:/silo_ado, the files will be in Z:/silo_ado/letter, where letter is the first letter of the package you installed. The files will usually be .ado or .sthlp files but may include other types.
Close Stata on Winstat to reset the PLUS directory to its default (U:/ado/plus).
Copy the files to Silo using Globus.
In Silo, move the files to Z:/ado/plus/letter, where letter is the name of the folder where you found the package files in step 4. If any of these folders do not exist, manually create them.
You may now use the package in Stata on Silo. Confirm with help *packagename*.

3.2.5 Computing Resources

The SSCC has 64-core Stata MP installed on all our Linux servers. Stata MP automatically parallelizes everything it can, but not all Stata commands can be parallelized. Start out by reserving 64 cores, but pay attention to the email you’ll get when your job finishes. If it tells you your job did not use very much of the CPU time it reserved you can reduce the number of cores you use for similar jobs in the future.

Note that only two servers in SlurmSilo have 64 cores (and one is frequently used for GPU jobs), so you may want to only reserve 44 cores so your job can run on other servers.

3.2.6 What to Read in Linux Essentials

You’ll need to read the entire chapter.

3.3 Python

3.3.1 Running Python Interactively

To run Python interactively, we suggest you use JupyterLab. You can run Spyder or PyCharm on Linstat and LinSilo, but they are very sensitive to network lag and performance is generally poor. You can use VS Code, Spyder, or PyCharm on Winstat/WinSilo to write Python scripts you will run on Linstat/LinSilo or submit to Slurm/SlurmSilo.

A Python job written as a Jupyter Notebook needs to be converted to a script before you can submit it to Slurm. See Converting Jupyter Notebooks to Scripts.

3.3.2 Running Python in Batch Mode

To run a Python job on Linstat briefly so you can monitor what resources it requires, log in and type:

python my_python_script.py &

where my_python_script.py should be replaced by the name of your Python script.

3.3.3 Submitting Python Jobs to Slurm

To submit a Python job to Slurm, log in and type:

ssubmit --cores=C --mem=Mg "python my_python_script.py"

where C should be replaced by the number of cores your job will use, M should be replaced by the number of gigabytes of memory your job will use, and my_python_script.py should be replaced by the name of your Python script.

Slurm Assistant can craft this command for you.

If you are using a conda environment, which we strongly recommend, you need to specify the environment the job should be run in. If the prefix (basically the path) for your conda environment is ./my_env then your ssubmit command would be:

ssubmit --cores=C --mem=Mg "conda run --prefix ./my_env python my_python_script.py"

See Using Conda Environments for Python at the SSCC for more information. You may also want to read Converting Python Loops to Parallel Loops Using Slurm (The Easy Way).

3.3.4 Installing Packages

We strongly recommend that you use conda environments to manage your Python packages. Instructions can be found in Using Conda Environments for Python at the SSCC. A conda environment contains a version of Python and your Python packages that you control and that will not be updated unless you update them. Python and Python packages change in ways that will break your code much more frequently than other languages the SSCC supports (in part because you’re expected to use something like conda to prevent that from being a problem), so without a conda environment your programs may not work for very long.

The SSCC installs the Anaconda distribution of Python, which includes a large number of useful packages. It is adequate for beginners or simple tasks where you do not care about long-term reproducibility. The SSCC updates Anaconda (and all our other software) in January and August, and there’s a chance that your code will stop working every time Anaconda is updated unless you use a conda environment.

To install a Python package without using a conda environment, run:

pip install --user package_name

where package_name should be replaced by the name of the package you want to install. You can upgrade an existing package by adding the --upgrade switch. Note that if you installed packages on Winstat you’ll need to install them again, but packages installed using Linstat or LinSilo will also be available in Slurm or SlurmSilo.

In Silo, you can only install packages directly from the PyPI archive. If you need to install packages from other locations, often you can install them on Linstat and then move them into the corresponding location on LinSilo using Globus. Otherwise, contact the Help Desk for assistance.

By default pip will install packages from the SSCC’s local mirror of PyPI, but the mirror does not contain older versions of packages. If you need an older version (notably, as of this writing tensorflow requires an older version of gast) add -i https://pypi.org/simple to your pip command before the name of the package to install.

3.3.5 Computing Resources

While Python itself can only use a single core, many Python functions have components written in C/C++ that can use multiple cores. You’ll need to do a test run and monitor your job’s actual resource usage to determine what to reserve.

3.3.6 What to Read in Linux Essentials

You’ll need to read the entire chapter.

3.4 Julia

3.4.1 Running Julia Interactively

To run Julia interactively, use any of these three options:

Command line: type julia to get the Julia REPL (read-eval-print loop)
Visual Studio Code: run VS Code on either Winstat or your own computer, and use the Remote-SSH extension to execute your Julia code on Linstat
JupyterLab: run your code in a Jupyter Notebook after installing a Julia kernel

A Julia job written as a Jupyter Notebook needs to be converted to a script before you can submit it to Slurm. See Converting Jupyter Notebooks to Scripts.

3.4.2 Running Julia in Batch Mode

To run a Julia job on Linstat briefly so you can monitor what resources it requires, log in and type:

julia my_julia_script.jl &

where my_julia_script.jl should be replaced by the name of your Julia script.

3.4.3 Submitting Julia Jobs to Slurm

To submit a Julia job to Slurm, log in and type:

ssubmit --cores=C --mem=Mg "julia my_julia_script.jl"

where C should be replaced by the number of cores your job will use, M should be replaced by the number of gigabytes of memory your job will use, and my_julia_script.jl should be replaced by the name of your Julia script.

Slurm Assistant can craft this command for you.

3.4.4 Julia Packages

Julia compiles packages into “images” but many images compiled on Linstat’s Intel CPUs cannot run on the AMD CPUs in most of the Slurm servers (and vice versa). The easy solution is to tell Julia not to use pre-compiled package images in Slurm:

ssubmit --cores=C --mem=Mg "julia --pkgimage=no my_julia_script.jl"

This does mean Julia will need to recompile its packages each time the job is run. If you’re determined to avoid this, you can 1) only use the small number of Intel servers in the Slurm cluster, or 2) only install and use packages using the AMD servers in Slurm (i.e. never run Julia on Linstat). You can restrict your job to using either Intel or AMD CPUs with the Slurm switches --constraint=intel or --constraint=amd. But we really recommend just using the Julia switch --pkgimage=no.

The SSCC updates the Julia standard libraries twice each year, usually in August and January. The packages you install depend on the standard libraries. An update to the standard libraries may require an update to some of your packages. To ensure that your carefully crafted Julia code works with future updates of the Julia standard libraries (the core Julia packages), you should organize your work in projects. To set up a Julia project, read Using Julia Projects.

3.4.5 SlurmClusterManager

The SlurmClusterManager package allows Julia to run jobs on multiple servers (also called nodes), letting you use hundreds of cores. However, it interacts with Slurm differently from other programs.

To use N cores with SlurmClusterManager, you set --ntasks to N rather than --cores. This runs N copies of your program with one core each, and SlurmClusterManager allows them to interact with each other. Set --nodes to the number of servers you want to use. --mem then sets the amount of memory to use on each server. For example, to run parallel_script.jl using 512 cores split across four servers, run:

ssubmit --ntasks=512 --nodes=4 --mem=250g 'julia parallel_script.jl

(Since you’re using all the cores in the four servers you might as well use all the memory too, but Julia jobs using lots of cores usually require lots of memory.)

SlurmClusterManager does not require that you set --pkgimage=no. In fact it won’t work if you do.

There is currently an apparent compatibility problem between SlurmClusterManager and Julia versions greater than 1.8. This leads to error messages like:

error in running finalizer: ErrorException("task switch not allowed from inside gc finalizer")

However, the finalizer method only runs when objects are cleaned up. In our testing, Julia code that used SlurmClusterManager and recent versions of Julia finished their work before throwing this error. If that’s true for your code, you can ignore the error. Otherwise, you can use Julia 1.8.3 by replacing julia with /software/julia-1.8.3/bin/julia in your command. Note that you’ll need to install SlurmClusterManager using Julia 1.8.3.

SSCC staff have very limited experience with Julia, so if you learn anything that helps you run Julia on the Slurm cluster (especially a fix for the finalizer error) let us know and we’ll be happy to share it with your fellow Julia users.

3.4.6 What to Read in Linux Essentials

You’ll need to read the entire chapter.

3.5 Matlab

3.5.1 Running Matlab Interactively

To run Matlab interactively on Linstat, log into the server and type:

matlab

This will give the same graphical user interface as in Windows or on a Mac.

The interface can be somewhat sensitive to network lag. If you’re seeing poor performance, you can get a web-based version of the Matlab user interface by typing:

matlab-proxy-app

You’ll be given a URL that you can copy into a browser on your own computer. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN.

Alternatively you can run the same application via JupyterLab. (Matlab run through JupyterLab does not let you write Jupyter Notebooks; it just runs the web version of Matlab.)

The Matlab web interface has a button that allows you to shut down the program, but you’ll also need to go back to your Linux session and press Ctrl-c to fully shut down the application on the server.

3.5.2 Running Matlab in Batch Mode

To run a Matlab job on Linstat briefly so you can monitor what resources it requires, log in and type:

matlab -nodisplay < my_matlab_script.m &

where my_matlab_script.m should be replaced by the name of your Matlab script.

3.5.3 Submitting Matlab Jobs to Slurm

To submit a Matlab job to Slurm, log in and type:

ssubmit --cores=C --mem=Mg "matlab -nodisplay < my_matlab_script.m"

where C should be replaced by the number of cores your job will use, M should be replaced by the number of gigabytes of memory your job will use, and my_matlab_script.m should be replaced by the name of your Matlab script.

Slurm Assistant can craft this command for you.

3.5.4 Matlab Parallel Server

With Matlab Parallel Server, a Matlab job running on Linstat can create a parallel pool that spans multiple Slurm servers. This allows a job to use hundreds of cores; thousands if they’re available. For details, see Running Matlab Parallel Server at the SSCC.

3.5.5 What to Read in Linux Essentials

You’ll need to read the entire chapter.

3.6 SAS

3.6.1 Running SAS Interactively

SAS Studio allows you to run SAS interactively on Linstat or LinSilo. The user interface is not the same as Windows SAS, but it’s easy to use. The interface runs in a web browser on your computer or on WinSilo, making it very responsive, while the computation happens on the server.

To use SAS Studio on Linstat you must be on SSCC’s network. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN. Then open a web browser and go to:

https://ssc.wisc.edu/sas-studio

To run SAS Studio on LinSilo (or LinSiloBig) first log into WinSilo as usual and then use the shortcut under LinSilo in the Programs menu.

You can also run SAS in JupyterLab on Linstat if you want to use Jupyter Notebooks.

3.6.2 Running SAS in Batch Mode

To run a test SAS job on Linstat so you can monitor what resources it requires, log in and type:

sas my_sas_program.sas &

where my_sas_program.sas should be replaced by the name of your SAS program.

3.6.3 Submitting SAS Jobs to Slurm

To submit a SAS job to Slurm, log in and type:

ssubmit --cores=C --mem=Mg "sas my_sas_program.sas"

where C should be replaced by the number of cores your job will use, M should be replaced by the number of gigabytes of memory your job will use, and my_sas_program.sas should be replaced by the name of your SAS program.

Slurm Assistant can craft this command for you.

3.6.4 Computing Resources

By default SAS uses just four cores and 2GB of memory. You can tell SAS to use as much memory as it needs (or is available) by adding the -memsize 0 switch to the sas command.

Unlike almost all other statistical software, SAS does not load the data it uses into memory. Instead, it loads one observation from the input data set, processes it, and then writes it to the output data set. That makes it highly dependent on disk I/O. Modern servers have dozens of cores but just one local disk and one or two network connections to the file server, so running many SAS jobs on the same server will lead to poor performance.

SAS stores temporary data sets on the server’s local disk and permanent data sets on the network file system. If you will be using very large data sets (>10GB) or run into problems with disk space, read Using Large SAS Data Sets with Linstat/Slurm.

3.6.5 What to Read in Linux Essentials

If you only plan to use SAS Studio you only need to read the section on specifying file locations. Otherwise you should read the entire chapter.

3.7 Fortran

The SSCC’s Linux servers have the GNU Fortran, Intel Fortran, and AMD Fortran compilers installed.

GNU Fortran is in everyone’s path, and the compiler can be invoked with gfortran.

Intel Fortran is installed in /opt/intel/oneapi/compiler/latest/bin/ (add that to your path if you plan to use it regularly). Note that that “Intel Fortran Compiler Classic” (ifort) has been discontinued. The replacement is ifx.

There is a script that sets environmental variables used by ifx. Sourcing it may correct problems with locating libraries. You can do so with:

source /opt/intel/oneapi/setvars.sh

AMD Fortran is installed in /opt/AMD/aocc-compiler/bin (add that to your path if you plan to use it regularly). The compiler can be invoked with flang.

The Linstat servers have Intel cores and most of the Slurm servers have AMD cores. GNU Fortran is always a safe choice, but you may get better performance by using the compiler that matches the cores the job will run on. However, which compiler will give the best performance can depend on your exact code.

When submitting MPI Fortran jobs to Slurm, use srun rather than mpirun.

SSCC staff have very limited expertise in Fortran. If you are a veteran Fortran user, we would be happy to add any additional information you can share about running Fortan on the SSCC’s servers.

3.7.1 What to Read in Linux Essentials.

If you’re new to Linux you should read the entire chapter.

3.8 JupyterLab

JupyterLab is a programming environment for working with Jupyter Notebooks, which can contain text, code, and the results of running that code in a single convenient file. It comes set up for Python, but you can install “kernels” on Linstat that allow you to write Jupyter Notebooks in Julia, R, Stata, SAS, or other languages.

JupyterLab’s user interface runs in a web browser on your computer, making it very responsive, while the code is run on the server. (If you accidentally open a web browser on the server it will be extremely unresponsive, and you should quit and start over.)

JupyterLab also includes a terminal so you can run commands on the server, a basic text editor, and viewers for some common file types like CSV files and images.

You must be on SSCC’s network to use JupyterLab on Linstat. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN. To use JupyterLab on LinSilo you must first log into Silo.

3.8.1 Running JupyterLab

To use JupyterLab, first log into the server you want to use. Then you must set the working directory to the location of the files you want to work with, or a directory above them, using the cd (change directory) command. We’ll discuss how to do so in the next chapter. JupyterLab can only see files and directories that are underneath the directory it starts in. Then type:

sscc-jupyter

You’ll then see a web address (twice) that you need to copy Be sure to copy the entire address and nothing but the address. If you’re using X-Win32 or Open OnDemand you can right-click on the first address and choose Copy Link Address. Paste that in a browser on your computer.

When you’re done using JupyterLab, click File, Shut Down so it shuts down the JupyterLab process on the server. Then you can close the browser tab on your computer.

3.8.2 Converting Jupyter Notebooks to Scripts

Before you can submit a job written in a Jupyter Notebook to Slurm you must convert it to a script. You can do so with the following Linux command:

jupyter nbconvert --to script my_notebook.ipynb

where my_notebook.ipynb should be replaced by the name of the notebook you want to convert. You can use wildcards to convert multiple notebooks, including *.ipynb to convert all the notebooks in the current directory.

The script will contain all the code in the notebook plus any markdown text converted into comments.

3.8.3 What to Read in Linux Essentials

You’ll need to read the entire chapter.