<- c("lmerTest", "stargazer", "ggeffects")
myPackages install.packages(myPackages)
3 Program Information
In this chapter we’ll discuss how to run the most popular programs at the SSCC.
This does not cover everything that’s installed on the servers: see our software database and the additional list of Biomedical Research Software on Silo. If you plan to use other software we presume you know how to run it, but you may want to read the Linux chapter and the Slurm chapter if you will submit jobs to Slurm.
You are welcome to install additional software on the SSCC’s Linux servers if you can install it in your home directory without needing to use sudo
. Otherwise, contact the Help Desk for assistance.
Note that JupyterLab has its own section, though it’s used to run other programs interactively.
3.1 R
3.1.1 Running R Interactively
RStudio Server allows you to run R interactively on Linstat or LinSilo (including LinSiloBig) with the same user interface you’re used to in Windows or on a Mac. The user interface runs in a web browser on your computer, making it very responsive, while the computation happens on the server.
To use RStudio Server on Linstat you must be on SSCC’s network. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN. Then open a web browser and go to:
https://ssc.wisc.edu/rstudio-server
To use RStudio Server on LinSilo or LinSiloBig you must first log into Silo. Then go to the programs menu and under LinSilo you’ll find shortcuts for RStudio Server.
You can also run R in JupyterLab.
3.1.2 Running R in Batch Mode
To run an R job on Linstat briefly so you can monitor what resources it requires, go to the Terminal tab in RStudio Server and type:
R CMD BATCH --no-save my_R_script.R &
where my_R_script.R
should be replaced by the name of your R script.
3.1.3 Submitting R Jobs to Slurm
To submit an R job to Slurm, go to the Terminal tab in RStudio Server and type:
ssubmit --cores=C --mem=Mg "R CMD BATCH --no-save my_R_script.R"
where C
should be replaced by the number of cores your job will use, M
should be replaced by the number of gigabytes of memory your job will use, and my_R_script.R
should be replaced by the name of your R script.
Slurm Assistant can craft this command for you.
You may also want to read Converting R Loops to Parallel Loops Using Slurm (The Easy Way).
3.1.4 Installing Packages
The SSCC installs a large number of the most popular packages for you. You can install additional packages using the standard install.packages()
function. Do so in an interactive session using RStudio Server (or the R command line). You only need to install a package once, so do not include calls to install.packages()
in your research scripts. Attempts to install packages in batch mode or in jobs submitted to Slurm will usually fail. Note that if you installed packages on Winstat you’ll need to install them again on the Linux file system, but packages installed using Linstat or LinSilo will be available in Slurm or SlurmSilo respectively.
In Silo, you can only install packages directly from CRAN and Bioconductor. If you need to install packages from other locations (e.g., GitHub), move the package files into Silo with Globus and then install them locally. For detailed instructions, click here.
The SSCC updates R before each semester (in August and January). R uses a different path for libraries with each “major” update. Major updates are when the first or second number in the R version changes, which usually happens in the summer. (Going from 4.3.0 to 4.3.1 is not a major update, while going from 4.3.0 to 4.4.0 or 5.0.0 are major updates.) Therefore, you will need to resintall any packages you want to use after each update to ensure you have a version that is compatible with the current version of R. If you use a lot of additional packages, consider creating a script that installs all of them, that you can rerun after each major update. Try something like:
3.1.5 Computing Resources
Base R uses just one core, though packages exist that allow it to use multiple cores for parallel processing. Sometimes packages will build in parallel processing functionality without making that clear–running a test job and monitoring it will tell you if that’s the case. But you should assume R jobs will only need one core unless you have reason to believe otherwise.
3.1.6 What to Read in Linux Essentials
Read the section on specifying file locations.
3.2 Stata
3.2.1 Running Stata Interactively
To run Stata interactively on Linstat, log into the server and type:
xstata
This will give the same graphical user interace as in Windows or on a Mac. In Open OnDemand, click Applications, Statistics, Stata.
The interface can be somewhat sensitive to network lag. If you find the user interface is slow to respond you may want to write most of your code on Winstat or your own computer and just log in to Linstat to run it. You can also use JupyterLab to do interactive work in Stata.
3.2.2 Running Stata in Batch Mode
To run a Stata job on Linstat briefly so you can monitor what resources it requires, log in and type:
stata -b do my_do_file &
where my_do_file
should be replaced by the name of your do file (including .do
in the name is optional).
3.2.3 Submitting Stata Jobs to Slurm
To submit a Stata job to Slurm, log in and type:
ssubmit --cores=C --mem=Mg "stata -b do my_do_file"
where C
should be replaced by the number of cores your job will use, M
should be replaced by the number of gigabytes of memory your job will use, and my_do_file
should be replaced by the name of your do file.
Slurm Assistant can craft this command for you.
You may also want to read Converting Stata Loops to Parallel Loops Using Slurm (The Easy Way).
3.2.4 Installing Packages
You can install Stata packages with the standard ssc install
or net
commands. Note that if you installed packages on Winstat you’ll need to install them again to use them on our Linux servers (including Slurm), but packages installed using Linstat or LinSilo will also be available in Slurm or SlurmSilo respectively.
In Silo, you can only install packages directly from the Statistical Software Components archive (SSC, not related to the SSCC). If you need to install packages from other locations, follow these steps:
Open Stata on Winstat.
Change your package installation location. Choose a location that does not already exist. This will make it easier to find the files we install.
For example, to change it to Z:/silo_ado, run
sysdir set PLUS Z:/silo_ado
Install the Stata package(s) you want.
For example,
net install *packagename*, from("*url*")
, replacing*packagename*
with the name of the package and*url*
from the site where it is hosted.Find the files Stata installed.
Assuming you set the PLUS folder to Z:/silo_ado, the files will be in Z:/silo_ado/letter, where letter is the first letter of the package you installed. The files will usually be .ado or .sthlp files but may include other types.
Close Stata on Winstat to reset the PLUS directory to its default (U:/ado/plus).
In Silo, move the files to Z:/ado/plus/letter, where letter is the name of the folder where you found the package files in step 4. If any of these folders do not exist, manually create them.
You may now use the package in Stata on Silo. Confirm with
help *packagename*
.
3.2.5 Computing Resources
The SSCC has 64-core Stata MP installed on all our Linux servers. Stata MP automatically parallelizes everything it can, but not all Stata commands can be parallelized. Start out by reserving 64 cores, but pay attention to the email you’ll get when your job finishes. If it tells you your job did not use very much of the CPU time it reserved you can reduce the number of cores you use for similar jobs in the future.
Note that only two servers in SlurmSilo have 64 cores (and one is frequently used for GPU jobs), so you may want to only reserve 44 cores so your job can run on other servers.
3.2.6 What to Read in Linux Essentials
You’ll need to read the entire chapter.
3.3 Python
3.3.1 Running Python Interactively
To run Python interactively, we suggest you use JupyterLab. You can run Spyder or PyCharm on Linstat and LinSilo, but they are very sensitive to network lag and performance is generally poor. You can use VS Code, Spyder, or PyCharm on Winstat/WinSilo to write Python scripts you will run on Linstat/LinSilo or submit to Slurm/SlurmSilo.
A Python job written as a Jupyter Notebook needs to be converted to a script before you can submit it to Slurm. See Converting Jupyter Notebooks to Scripts.
3.3.2 Running Python in Batch Mode
To run a Python job on Linstat briefly so you can monitor what resources it requires, log in and type:
python my_python_script.py &
where my_python_script.py
should be replaced by the name of your Python script.
3.3.3 Submitting Python Jobs to Slurm
To submit a Python job to Slurm, log in and type:
ssubmit --cores=C --mem=Mg "python my_python_script.py
”
where C
should be replaced by the number of cores your job will use, M
should be replaced by the number of gigabytes of memory your job will use, and my_python_script.py
should be replaced by the name of your Python script.
Slurm Assistant can craft this command for you.
You may also want to read Converting Python Loops to Parallel Loops Using Slurm (The Easy Way).
3.3.4 Installing Packages
We strongly recommend that you use conda environments to manage your Python packages. Instructions can be found in Using Conda Environments for Python at the SSCC. A conda environment contains a version of Python and your Python packages that you control and that will not be updated unless you update them. Python and Python packages change in ways that will break your code much more frequently than other languages the SSCC supports (in part because you’re expected to use something like conda to prevent that from being a problem), so without a conda environment your programs may not work for very long.
The SSCC installs the Anaconda distribution of Python, which includes a large number of useful packages. It is adequate for beginners or simple tasks where you do not care about long-term reproducibility. The SSCC updates Anaconda (and all our other software) in January and August, and there’s a chance that your code will stop working every time Anaconda is updated unless you use a conda environment.
To install a Python package without using a conda environment, run:
pip install --user package_name
where package_name
should be replaced by the name of the package you want to install. You can upgrade an existing package by adding the --upgrade
switch. Note that if you installed packages on Winstat you’ll need to install them again, but packages installed using Linstat or LinSilo will also be available in Slurm or SlurmSilo.
In Silo, you can only install packages directly from the PyPI archive. If you need to install packages from other locations, often you can install them on Linstat and then move them into the corresponding location on LinSilo using Globus. Otherwise, contact the Help Desk for assistance.
By default pip
will install packages from the SSCC’s local mirror of PyPI, but the mirror does not contain older versions of packages. If you need an older version (notably, as of this writing tensorflow
requires an older version of gast
) add -i https://pypi.org/simple
to your pip command before the name of the package to install.
3.3.5 Computing Resources
While Python itself can only use a single core, many Python functions have components written in C/C++ that can use multiple cores. You’ll need to do a test run and monitor your job’s actual resource usage to determine what to reserve.
3.3.6 What to Read in Linux Essentials
You’ll need to read the entire chapter.
3.4 Julia
3.4.1 Running Julia Interactively
To run Julia interactively, use any of these three options:
- Command line: type
julia
to get the Julia REPL (read-eval-print loop) - Visual Studio Code: run VS Code on either Winstat or your own computer, and use the
Remote-SSH
extension to execute your Julia code on Linstat - JupyterLab: run your code in a Jupyter Notebook after installing a Julia kernel
A Julia job written as a Jupyter Notebook needs to be converted to a script before you can submit it to Slurm. See Converting Jupyter Notebooks to Scripts.
3.4.2 Running Julia in Batch Mode
To run a Julia job on Linstat briefly so you can monitor what resources it requires, log in and type:
julia my_julia_script.jl &
where my_julia_script.jl
should be replaced by the name of your Julia script.
3.4.3 Submitting Julia Jobs to Slurm
To submit a Python job to Slurm, log in and type:
ssubmit --cores=C --mem=Mg "julia my_julia_script.jl"
where C
should be replaced by the number of cores your job will use, M
should be replaced by the number of gigabytes of memory your job will use, and my_julia_script.jl
should be replaced by the name of your Julia script.
Slurm Assistant can craft this command for you.
3.4.4 Julia Packages
Julia compiles packages into “images” but many images compiled on Linstat’s Intel CPUs cannot run on the AMD CPUs in most of the Slurm servers (and vice versa). The easy solution is to tell Julia not to use pre-compiled package images in Slurm:
ssubmit --cores=C --mem=Mg "julia --pkgimage=no my_julia_script.jl"
This does mean Julia will need to recompile its packages each time the job is run. If you’re determined to avoid this, you can 1) only use the small number of Intel servers in the Slurm cluster, or 2) only install and use packages using the AMD servers in Slurm (i.e. never run Julia on Linstat). You can restrict your job to using either Intel or AMD CPUs with the Slurm switches --constraint=intel
or --constraint=amd
. But we really recommend just using the Julia switch --pkgimage=no
.
The SSCC updates the Julia standard libraries twice each year, usually in August and January. The packages you install depend on the standard libraries. An update to the standard libraries may require an update to some of your packages. To ensure that your carefully crafted Julia code works with future updates of the Julia standard libraries (the core Julia packages), you should organize your work in projects. To set up a Julia project, read Using Julia Projects.
3.4.5 SlurmClusterManager
The SlurmClusterManager package allows Julia to run jobs on multiple servers (also called nodes), letting you use hundreds of cores. However, it interacts with Slurm differently from other programs.
To use N cores with SlurmClusterManager, you set --ntasks
to N rather than --cores
. This runs N copies of your program with one core each, and SlurmClusterManager allows them to interact with each other. Set --nodes
to the number of servers you want to use. --mem
then sets the amount of memory to use on each server. For example, to run parallel_script.jl
using 512 cores split across four servers, run:
ssubmit --ntasks=512 --nodes=4 --mem=250g 'julia parallel_script.jl
(Since you’re using all the cores in the four servers you might as well use all the memory too, but Julia jobs using lots of cores usually require lots of memory.)
SlurmClusterManager does not require that you set --pkgimage=no
. In fact it won’t work if you do.
There is currently an apparent compatibility problem between SlurmClusterManager and Julia versions greater than 1.8. This leads to error messages like:
error in running finalizer: ErrorException("task switch not allowed from inside gc finalizer")
However, the finalizer method only runs when objects are cleaned up. In our testing, Julia code that used SlurmClusterManager and recent versions of Julia finished their work before throwing this error. If that’s true for your code, you can ignore the error. Otherwise, you can use Julia 1.8.3 by replacing julia
with /software/julia-1.8.3/bin/julia
in your command. Note that you’ll need to install SlurmClusterManager using Julia 1.8.3.
SSCC staff have very limited experience with Julia, so if you learn anything that helps you run Julia on the Slurm cluster (especially a fix for the finalizer error) let us know and we’ll be happy to share it with your fellow Julia users.
3.4.6 What to Read in Linux Essentials
You’ll need to read the entire chapter.
3.5 Matlab
3.5.1 Running Matlab Interactively
To run Matlab interactively on Linstat, log into the server and type:
matlab
This will give the same graphical user interface as in Windows or on a Mac.
The interface can be somewhat sensitive to network lag. If you’re seeing poor performance, you can get a web-based version of the Matlab user interface by typing:
matlab-proxy-app
You’ll be given a URL that you can copy into a browser on your own computer. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN.
Alternatively you can run the same application via JupyterLab. (Matlab run through JupyterLab does not let you write Jupyter Notebooks; it just runs the web version of Matlab.)
The Matlab web interface has a button that allows you to shut down the program, but you’ll also need to go back to your Linux session and press Ctrl-c
to fully shut down the application on the server.
3.5.2 Running Matlab in Batch Mode
To run a Matlab job on Linstat briefly so you can monitor what resources it requires, log in and type:
matlab -nodisplay < my_matlab_script.m &
where my_matlab_script.m
should be replaced by the name of your Matlab script.
3.5.3 Submitting Matlab Jobs to Slurm
To submit a Matlab job to Slurm, log in and type:
ssubmit --cores=C --mem=Mg "matlab -nodisplay < my_matlab_script.m"
where C
should be replaced by the number of cores your job will use, M
should be replaced by the number of gigabytes of memory your job will use, and my_matlab_script.m
should be replaced by the name of your Matlab script.
Slurm Assistant can craft this command for you.
3.5.4 Matlab Parallel Server
With Matlab Parallel Server, a Matlab job running on Linstat can create a parallel pool that spans multiple Slurm servers. This allows a job to use hundreds of cores; thousands if they’re available. For details, see Running Matlab Parallel Server at the SSCC.
3.5.5 What to Read in Linux Essentials
You’ll need to read the entire chapter.
3.6 SAS
3.6.1 Running SAS Interactively
SAS Studio allows you to run SAS interactively on Linstat or LinSilo. The user interface is not the same as Windows SAS, but it’s easy to use. The interface runs in a web browser on your computer or on WinSilo, making it very responsive, while the computation happens on the server.
To use SAS Studio on Linstat you must be on SSCC’s network. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN. Then open a web browser and go to:
https://ssc.wisc.edu/sas-studio
To run SAS Studio on LinSilo (or LinSiloBig) first log into WinSilo as usual and then use the shortcut under LinSilo in the Programs menu.
You can also run SAS in JupyterLab on Linstat if you want to use Jupyter Notebooks.
3.6.2 Running SAS in Batch Mode
To run a test SAS job on Linstat so you can monitor what resources it requires, log in and type:
sas my_sas_program.sas &
where my_sas_program.sas
should be replaced by the name of your SAS program.
3.6.3 Submitting SAS Jobs to Slurm
To submit a SAS job to Slurm, log in and type:
ssubmit --cores=C --mem=Mg "sas my_sas_program.sas"
where C
should be replaced by the number of cores your job will use, M
should be replaced by the number of gigabytes of memory your job will use, and my_sas_program.sas
should be replaced by the name of your SAS program.
Slurm Assistant can craft this command for you.
3.6.4 Computing Resources
By default SAS uses just four cores and 2GB of memory. You can tell SAS to use as much memory as it needs (or is available) by adding the -memsize 0
switch to the sas
command.
Unlike almost all other statistical software, SAS does not load the data it uses into memory. Instead, it loads one observation from the input data set, processes it, and then writes it to the output data set. That makes it highly dependent on disk I/O. Modern servers have dozens of cores but just one local disk and one or two network connections to the file server, so running many SAS jobs on the same server will lead to poor performance.
SAS stores temporary data sets on the server’s local disk and permanent data sets on the network file system. If you will be using very large data sets (>10GB) or run into problems with disk space, read Using Large SAS Data Sets with Linstat/Slurm.
3.6.5 What to Read in Linux Essentials
If you only plan to use SAS Studio you only need to read the section on specifying file locations. Otherwise you should read the entire chapter.
3.7 Fortran
The SSCC’s Linux servers have the GNU Fortran, Intel Fortran, and AMD Fortran compilers installed.
GNU Fortran is in everyone’s path, and the compiler can be invoked with gfortran
.
Intel Fortran is installed in /opt/intel/oneapi/compiler/latest/bin/
(add that to your path if you plan to use it regularly). Note that that “Intel Fortran Compiler Classic” (ifort
) is now considered deprecated and will be discontinued in late 2024. The replacement is ifx
.
AMD Fortran is installed in /opt/AMD/aocc-compiler/bin
(add that to your path if you plan to use it regularly). The compiler can be invoked with flang
.
The Linstat servers have Intel cores and most of the Slurm servers have AMD cores. GNU Fortran is always a safe choice, but you may get better performance by using the compiler that matches the cores the job will run on. However, which compiler will give the best performance can depend on your exact code.
3.7.1 What to Read in Linux Essentials.
If you’re new to Linux you should read the entire chapter.
3.8 JupyterLab
JupyterLab is a programming environment for working with Jupyter Notebooks, which can contain text, code, and the results of running that code in a single convenient file. It comes set up for Python, but you can install “kernels” on Linstat that allow you to write Jupyter Notebooks in Julia, R, Stata, SAS, or other languages.
JupyterLab’s user interface runs in a web browser on your computer, making it very responsive, while the code is run on the server. (If you accidentally open a web browser on the server it will be extremely unresponsive, and you should quit and start over.)
JupyterLab also includes a terminal so you can run commands on the server, a basic text editor, and viewers for some common file types like CSV files and images.
You must be on SSCC’s network to use JupyterLab on Linstat. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN. To use JupyterLab on LinSilo you must first log into Silo.
3.8.1 Running JupyterLab
To use JupyterLab, first log into the server you want to use. Then you must set the working directory to the location of the files you want to work with, or a directory above them, using the cd
(change directory) command. We’ll discuss how to do so in the next chapter. JupyterLab can only see files and directories that are underneath the directory it starts in. Then type:
sscc-jupyter
You’ll then see a web address (twice) that you need to copy Be sure to copy the entire address and nothing but the address. If you’re using X-Win32 or Open OnDemand you can right-click on the first address and choose Copy Link Address
. Paste that in a browser on your computer.
When you’re done using JupyterLab, click File
, Shut Down
so it shuts down the JupyterLab process on the server. Then you can close the browser tab on your computer.
3.8.2 Converting Jupyter Notebooks to Scripts
Before you can submit a job written in a Jupyter Notebook to Slurm you must convert it to a script. You can do so with the following Linux command:
jupyter nbconvert --to scriptmy_notebook.ipynb
where my_notebook.ipynb
should be replaced by the name of the notebook you want to convert. You can use wildcards to convert multiple notebooks, including *.ipynb
to convert all the notebooks in the current directory.
The script will contain all the code in the notebook plus any markdown text converted into comments.
3.8.3 What to Read in Linux Essentials
You’ll need to read the entire chapter.