3  Program Information

In this chapter we’ll discuss how to run the most popular programs at the SSCC.

This does not cover everything that’s installed on the servers: see our software database and the additional list of Biomedical Research Software on Silo. If you plan to use other software we presume you know how to run it, but you will want to read the Linux chapter and the Slurm chapter if you will submit jobs to Slurm.

You are welcome to install additional software on the SSCC’s Linux servers if you can install it in your home directory without needing to use sudo. Otherwise, contact the Help Desk for assistance.

Note that JupyterLab has its own section, though it’s used to run Python, Julia, R, Stata, or SAS interactively.

3.1 R

3.1.1 Running R Interactively

RStudio Server allows you to run R interactively on Linstat or LinSilo (including LinSiloBig) with the same user interface you’re used to in Windows or on a Mac. The user interface runs in a web browser on your computer, making it very responsive, while the computation happens on the server.

To use RStudio Server on Linstat you must be on SSCC’s network. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN. Then open a web browser and go to:

https://ssc.wisc.edu/rstudio-server

To use RStudio Server on LinSilo or LinSiloBig you must first log into Silo. Then go to the programs menu and under LinSilo you’ll find shortcuts for RStudio Server.

You can also run R in JupyterLab.

3.1.2 Running R in Batch Mode

To run an R job on Linstat briefly so you can monitor what resources it requires, go to the Terminal tab in RStudio Server and type:

R CMD BATCH --no-savemy_R_script.R&

where my_R_script.R should be replaced by the name of your R script.

3.1.3 Submitting R Jobs to Slurm

To submit an R job to Slurm, go to the Terminal tab in RStudio Server and type:

ssubmit --cores=C --mem=Mg "R CMD BATCH --no-savemy_R_script.R"

where C should be replaced by the number of cores your job will use, M should be replaced by the number of gigabytes of memory your job will use, and my_R_script.R should be replaced by the name of your R script.

Slurm Assistant can craft this command for you.

You may also want to read Converting R Loops to Parallel Loops Using Slurm (The Easy Way).

3.1.4 Installing Packages

The SSCC installs a large number of the most popular packages for you. You can install additional packages using the standard install.packages() function. Do so in an interactive session using RStudio Server (or the R command line). You only need to install a package once, so do not include calls to install.packages() in your research scripts. Attempts to install packages in batch mode or in jobs submitted to Slurm will usually fail. Note that if you installed packages on Winstat you’ll need to install them again on the Linux file system, but packages installed using Linstat or LinSilo will be available in Slurm or SlurmSilo respectively.

In Silo, you can only install packages directly from CRAN and Bioconductor. If you need to install packages from other locations (e.g., GitHub), move the package files into Silo with Globus and then install them locally. For detailed instructions, click here.

The SSCC updates R before each semester (in August and January). R uses a different path for libraries with each “major” update. Major updates are when the first or second number in the R version changes, which usually happens in the summer. (Going from 4.3.0 to 4.3.1 is not a major update, while going from 4.3.0 to 4.4.0 or 5.0.0 are major updates.) Therefore, you will need to resintall any packages you want to use after each update to ensure you have a version that is compatible with the current version of R. If you use a lot of additional packages, consider creating a script that installs all of them, that you can rerun after each major update. Try something like:

myPackages <- c("lmerTest", "stargazer", "ggeffects")
install.packages(myPackages)

3.1.5 Computing Resources

Base R uses just one core, though packages exist that allow it to use multiple cores for parallel processing. Sometimes packages will build in parallel processing functionality without making that clear–running a test job and monitoring it will tell you if that’s the case. But you should assume R jobs will only need one core unless you have reason to believe otherwise.

3.1.6 What to Read in Linux Essentials

Read the section on specifying file locations.

3.2 Stata

3.2.1 Running Stata Interactively

To run Stata interactively on Linstat, log into the server and type:

xstata

This will give the same graphical user interace as in Windows or on a Mac.

The interface can be somewhat sensitive to network lag. If you find the user interface is slow to respond you may want to write most of your code on Winstat or your own computer and just log in to Linstat to run it. You can also use JupyterLab to do interactive work in Stata.

3.2.2 Running Stata in Batch Mode

To run a Stata job on Linstat briefly so you can monitor what resources it requires, log in and type:

stata -b domy_do_file&

where my_do_file should be replaced by the name of your do file (including .do in the name is optional).

3.2.3 Submitting Stata Jobs to Slurm

To submit a Stata job to Slurm, log in and type:

ssubmit --cores=C --mem=Mg "stata -b domy_do_file"

where C should be replaced by the number of cores your job will use, M should be replaced by the number of gigabytes of memory your job will use, and my_do_file should be replaced by the name of your do file.

Slurm Assistant can craft this command for you.

You may also want to read Converting Stata Loops to Parallel Loops Using Slurm (The Easy Way).

3.2.4 Installing Packages

You can install Stata packages with the standard ssc install or net commands. Note that if you installed packages on Winstat you’ll need to install them again to use them on our Linux servers (including Slurm), but packages installed using Linstat or LinSilo will also be available in Slurm or SlurmSilo.

In Silo, you can only install packages directly from the Statistical Software Components archive (SSC, not related to the SSCC). If you need to install packages from other locations, follow these steps:

  1. Open Stata on Winstat.

  2. Change your package installation location. Choose a location that does not already exist. This will make it easier to find the files we install.

    For example, to change it to Z:/silo_ado, run sysdir set PLUS Z:/silo_ado

  3. Install the Stata package(s) you want.

    For example, net install *packagename*, from("*url*"), replacing *packagename* with the name of the package and *url* from the site where it is hosted.

  4. Find the files Stata installed.

    Assuming you set the PLUS folder to Z:/silo_ado, the files will be in Z:/silo_ado/letter, where letter is the first letter of the package you installed. The files will usually be .ado or .sthlp files but may include other types.

  5. Close Stata on Winstat to reset the PLUS directory to its default (U:/ado/plus).

  6. Copy the files to Silo using Globus.

  7. In Silo, move the files to Z:/ado/plus/letter, where letter is the name of the folder where you found the package files in step 4. If any of these folders do not exist, manually create them.

  8. You may now use the package in Stata on Silo. Confirm with help *packagename*.

3.2.5 Computing Resources

The SSCC has 64-core Stata MP installed on all our Linux servers. Stata MP automatically parallelizes everything it can, but not all Stata commands can be parallelized. Start out by reserving 64 cores, but pay attention to the email you’ll get when your job finishes. If it tells you your job did not use very much of the CPU time it reserved you can reduce the number of cores you use for similar jobs in the future.

Note that only one server in SlurmSilo has 64 cores, so you may want to only reserve 44 cores so your job can run on other servers.

3.2.6 What to Read in Linux Essentials

You’ll need to read the entire chapter.

3.3 Python

3.3.1 Running Python Interactively

To run Python interactively, we suggest you use JupyterLab. You can run Spyder on Linstat and LinSilo, but they are very sensitive to network lag and performance is generally poor. You can use them or VS Code on Winstat/WinSilo to write Python scripts you will run on Linstat/LinSilo or submit to Slurm/SlurmSilo.

A Python job written as a Jupyter Notebook needs to be converted to a script before you can submit it to Slurm. See Converting Jupyter Notebooks to Scripts.

3.3.2 Running Python in Batch Mode

To run a Python job on Linstat briefly so you can monitor what resources it requires, log in and type:

pythonmy_python_script.py&

where my_python_script.py should be replaced by the name of your Python script.

3.3.3 Submitting Python Jobs to Slurm

To submit a Python job to Slurm, log in and type:

ssubmit --cores=C --mem=Mg "pythonmy_python_script.py"

where C should be replaced by the number of cores your job will use, M should be replaced by the number of gigabytes of memory your job will use, and my_python_script.py should be replaced by the name of your Python script.

Slurm Assistant can craft this command for you.

You may also want to read Converting Python Loops to Parallel Loops Using Slurm (The Easy Way).

3.3.4 Installing Packages

To install a Python package, run:

pip install --userpackage_name

where package_name should be replaced by the name of the package you want to install. Note that if you installed packages on Winstat you’ll need to install them again, but packages installed using Linstat or LinSilo will also be available in Slurm or SlurmSilo.

The SSCC installs the Anaconda distribution of Python, which includes a large number of useful packages. You can upgrade an existing package by adding the --upgrade switch.

In Silo, you can only install packages directly from the PyPI archive. If you need to install packages from other locations, often you can install them on Linstat and then move them into the corresponding location on LinSilo using Globus. Otherwise, contact the Help Desk for assistance.

By default pip will install packages from the SSCC’s local mirror of PyPI, but the mirror does not contain older versions of packages. If you need an older version (notably, as of this writing tensorflow requires an older version of gast) add -i https://pypi.org/simple to your pip command before the name of the package to install.

3.3.5 Python Environments

The SSCC regularly updates the software on its servers, which can introduce problems for Python users. Some Python packages often do not work with the latest version of Python or other Python packages. We strongly recommend the use of environments, which allow you to use a combination of Python and package versions so that your code will continue to run even after the SSCC updates its software. To learn more, read our article on using Conda environments for Python at the SSCC.

3.3.6 Computing Resources

While Python itself can only use a single core, many Python functions have components written in C/C++ that can use multiple cores. You’ll need to do a test run and monitor your job’s actual resource usage to determine what to reserve.

3.3.7 What to Read in Linux Essentials

You’ll need to read the entire chapter.

3.4 Julia

3.4.1 Running Julia Interactively

To run Julia interactively, use JupyterLab or type julia to get the Julia REPL (read-eval-print loop, also known as the command line).

A Julia job written as a Jupyter Notebook needs to be converted to a script before you can submit it to Slurm. See Converting Jupyter Notebooks to Scripts.

3.4.2 Running Julia in Batch Mode

To run a Julia job on Linstat briefly so you can monitor what resources it requires, log in and type:

juliamy_julia_script.jl&

where my_julia_script.jl should be replaced by the name of your Julia script.

3.4.3 Submitting Julia Jobs to Slurm

To submit a Python job to Slurm, log in and type:

ssubmit --cores=C --mem=Mg "juliamy_julia_script.jl"

where C should be replaced by the number of cores your job will use, M should be replaced by the number of gigabytes of memory your job will use, and my_julia_script.jl should be replaced by the name of your Julia script.

Slurm Assistant can craft this command for you.

3.4.4 Julia Packages

Julia compiles packages into “images” but many images compiled on Linstat’s Intel CPUs cannot run on the AMD CPUs in most of the Slurm servers (and vice versa). The easy solution is to tell Julia not to use pre-compiled package images in Slurm:

ssubmit --cores=C --mem=Mg "julia --pkgimage=nomy_julia_script.jl"

This does mean Julia will need to recompile its packages each time the job is run. If you’re determined to avoid this, you can 1) only use the small number of Intel servers in the Slurm cluster, or 2) only install and use packages using the AMD servers in Slurm (i.e. never run Julia on Linstat). You can restrict your job to using either Intel or AMD CPUs with the Slurm switches --constraint=intel or --constraint=amd. But we really recommend just using the Julia switch --pkgimage=no.

3.4.5 What to Read in Linux Essentials

You’ll need to read the entire chapter.

3.5 Matlab

3.5.1 Running Matlab Interactively

To run Matlab interactively on Linstat, log into the server and type:

matlab

This will give the same graphical user interface as in Windows or on a Mac.

The interface can be somewhat sensitive to network lag. If you’re seeing poor performance, you can get a web-based version of the Matlab user interface by typing:

matlab-proxy-app

You’ll be given a URL that you can copy into a browser on your own computer. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN.

Alternatively you can run the same application via JupyterLab. (Matlab run through JupyterLab does not let you write Jupyter Notebooks; it just runs the web version of Matlab.)

The Matlab web interface has a button that allows you to shut down the program, but you’ll also need to go back to your Linux session and press Ctrl-c to fully shut down the application on the server.

3.5.2 Running Matlab in Batch Mode

To run a Matlab job on Linstat briefly so you can monitor what resources it requires, log in and type:

matlab -nodisplay <my_matlab_script.m&

where my_matlab_script.m should be replaced by the name of your Matlab script.

3.5.3 Submitting Matlab Jobs to Slurm

To submit a Matlab job to Slurm, log in and type:

ssubmit --cores=C --mem=Mg "matlab -nodisplay <my_matlab_script.m"

where C should be replaced by the number of cores your job will use, M should be replaced by the number of gigabytes of memory your job will use, and my_matlab_script.m should be replaced by the name of your Matlab script.

Slurm Assistant can craft this command for you.

3.5.4 Matlab Parallel Server

With Matlab Parallel Server, a Matlab job running on Linstat can create a parallel pool that spans multiple Slurm servers. This allows a job to use hundreds of cores; thousands if they’re available. For details, see Running Matlab Parallel Server at the SSCC.

3.5.5 What to Read in Linux Essentials

You’ll need to read the entire chapter.

3.6 SAS

3.6.1 Running SAS Interactively

SAS Studio allows you to run SAS interactively on Linstat or LinSilo. The user interface is not the same as Windows SAS, but it’s easy to use. The interface runs in a web browser on your computer or on WinSilo, making it very responsive, while the computation happens on the server.

To use SAS Studio on Linstat you must be on SSCC’s network. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN. Then open a web browser and go to:

https://ssc.wisc.edu/sas-studio

To run SAS Studio on LinSilo (or LinSiloBig) first log into WinSilo as usual and then use the shortcut under LinSilo in the Programs menu.

You can also run SAS in JupyterLab on Linstat if you want to use Jupyter Notebooks.

3.6.2 Running SAS in Batch Mode

To run a test SAS job on Linstat so you can monitor what resources it requires, log in and type:

sasmy_sas_program.sas&

where my_sas_program.sas should be replaced by the name of your SAS program.

3.6.3 Submitting SAS Jobs to Slurm

To submit a SAS job to Slurm, log in and type:

ssubmit --cores=C --mem=Mg "sasmy_sas_program.sas"

where C should be replaced by the number of cores your job will use, M should be replaced by the number of gigabytes of memory your job will use, and my_sas_program.sas should be replaced by the name of your SAS program.

Slurm Assistant can craft this command for you.

3.6.4 Computing Resources

By default SAS uses just four cores and 2GB of memory. You can tell SAS to use as much memory as it needs (or is available) by adding the -memsize 0 switch to the sas command.

Unlike almost all other statistical software, SAS does not load the data it uses into memory. Instead, it loads one observation from the input data set, processes it, and then writes it to the output data set. That makes it highly dependent on disk I/O. Modern servers have dozens of cores but just one local disk and one or two network connections to the file server, so running many SAS jobs on the same server will lead to poor performance.

SAS stores temporary data sets on the server’s local disk and permanent data sets on the network file system. Local disk is much faster, but space is limited. You may need to make very large data sets you don’t plan to keep “permanent” just so they’re stored on the network file system and then erase them when you’re done with them. In practice, Linux uses memory as a buffer for both local and network disk, and small to medium data sets may always be available in the buffer and thus very fast.

3.6.5 What to Read in Linux Essentials

If you only plan to use SAS Studio you only need to read the section on specifying file locations. Otherwise you should read the entire chapter.

3.7 Fortran

The SSCC’s Linux servers have the GNU Fortran, Intel Fortran, and AMD Fortran compilers installed.

GNU Fortran is in everyone’s path, and the compiler can be invoked with gfortran.

Intel Fortran is installed in /opt/intel/oneapi/compiler/latest/bin/ (add that to your path if you plan to use it regularly). Note that that “Intel Fortran Compiler Classic” (ifort) is now considered deprecated and will be discontinued in late 2024. The replacement is ifx.

AMD Fortran in installed in /opt/AMD/aocc-compiler/bin (add that to your path if you plan to use it regularly). The compiler can be invoked with flang.

The Linstat servers have Intel cores and most of the Slurm servers have AMD cores. GNU Fortran is always a safe choice, but you may get better performance by using the compiler that matches the cores the job will run on. However, which compiler will give the best performance can depend on your exact code.

3.7.1 What to Read in Linux Essentials

If you’re new to Linux, you should read the entire chapter

3.8 JupyterLab

JupyterLab is a programming environment for working with Jupyter Notebooks, which can contain text, code, and the results of running that code in a single convenient file. We have installed “kernels” on Linstat that allow you to write Jupyter Notebooks in Python, Julia, R, Stata, and SAS. JupyterLab is the preferred method for working interactively on Linstat and LinSilo in Python and Julia. Stata, SAS, and R users may be interested in JupyterLab for the Jupyter Notebooks, though RStudio has its own version of Notebooks.

JupyterLab’s user interface runs in a web browser on your computer, making it very responsive, while the code is run on the server. (If you accidentally open a web browser on the server it will be extremely unresponsive, and you should quit and start over.)

JupyterLab also includes a terminal so you can run commands on the server, a basic text editor, and viewers for some common file types like CSV files and images.

You must be on SSCC’s network to use JupyterLab on Linstat. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN. To use JupyterLab on LinSilo you must first log into Silo.

3.8.1 Running JupyterLab

To use JupyterLab, first log into the server you want to use. Then you must set the working directory to the location of the files you want to work with, or a directory above them, using the cd (change directory) command. We’ll discuss how to do so in the next chapter. JupyterLab can only see files and directories that are underneath the directory it starts in. Then type:

sscc-jupyter

You’ll then see a web address (twice) that you need to copy Be sure to copy the entire address and nothing but the address. If you’re using X-Win32 you can right-click on the first address and choose Copy Link Address. Paste that in a browser on your computer.

When you’re done using JupyterLab, click File, Shut Down so it shuts down the JupyterLab process on the server. Then you can close the browser tab on your computer.

3.8.2 Converting Jupyter Notebooks to Scripts

Before you can submit a job written in a Jupyter Notebook to Slurm you must convert it to a script. You can do so with the following Linux command:

jupyter nbconvert --to scriptmy_notebook.ipynb

where my_notebook.ipynb should be replaced by the name of the notebook you want to convert. You can use wildcards to convert multiple notebooks, including *.ipynb to convert all the notebooks in the current directory.

The script will contain all the code in the notebook plus any markdown text converted into comments.

3.8.3 What to Read in Linux Essentials

You’ll need to read the entire chapter.