3 Program Information
In this chapter we’ll discuss how to run the most popular programs at the SSCC.
This does not cover everything that’s installed on the servers: see our software database and the additional list of Biomedical Research Software on Silo. If you plan to use other software we presume you know how to run it, but you will want to read the Linux chapter and the Slurm chapter if you will submit jobs to Slurm.
You are welcome to install additional software on the SSCC’s Linux servers if you can install it in your home directory without needing to use sudo
. Otherwise, contact the Help Desk for assistance.
Note that JupyterLab has its own section, though it’s used to run Python, Julia, R, Stata, or SAS interactively.
3.1 R
3.1.1 Running R Interactively
RStudio Server allows you to run R interactively on Linstat or LinSilo (including LinSiloBig) with the same user interface you’re used to in Windows or on a Mac. The user interface runs in a web browser on your computer, making it very responsive, while the computation happens on the server.
To use RStudio Server on Linstat you must be on SSCC’s network. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN. Then open a web browser and go to:
https://ssc.wisc.edu/rstudio-server
To use RStudio Server on LinSilo or LinSiloBig you must first log into Silo. Then go to the programs menu and under LinSilo you’ll find shortcuts for RStudio Server.
You can also run R in JupyterLab.
3.1.2 Running R in Batch Mode
To run an R job on Linstat briefly so you can monitor what resources it requires, go to the Terminal tab in RStudio Server and type:
R CMD BATCH --no-save
my_R_script.R
&
where my_R_script.R
should be replaced by the name of your R script.
3.1.3 Submitting R Jobs to Slurm
To submit an R job to Slurm, go to the Terminal tab in RStudio Server and type:
ssubmit --cores=C --mem=Mg "R CMD BATCH --no-save
my_R_script.R
"
where C
should be replaced by the number of cores your job will use, M
should be replaced by the number of gigabytes of memory your job will use, and my_R_script.R
should be replaced by the name of your R script.
Slurm Assistant can craft this command for you.
You may also want to read Converting R Loops to Parallel Loops Using Slurm (The Easy Way).
3.1.4 Installing Packages
The SSCC installs a large number of the most popular packages for you. You can install additional packages using the standard install.packages()
function. Do so in an interactive session using RStudio Server (or the R command line). You only need to install a package once, so do not include calls to install.packages()
in your research scripts. Attempts to install packages in batch mode or in jobs submitted to Slurm will usually fail. Note that if you installed packages on Winstat you’ll need to install them again on the Linux file system, but packages installed using Linstat or LinSilo will be available in Slurm or SlurmSilo respectively.
In Silo, you can only install packages directly from CRAN and Bioconductor. If you need to install packages from other locations, often you can install them on Linstat and then move them into the corresponding location on LinSilo using Globus. Otherwise, contact the Help Desk for assistance.
The SSCC updates R before each semester (in August and January). You will need to reinstall any packages you install after each update, to ensure you have a version that is compatible with the current version of R. If you use a lot of additional packages, consider creating a script you can run once a semester that installs them all.
3.1.5 Computing Resources
Base R uses just one core, though packages exist that allow it to use multiple cores for parallel processing. Sometimes packages will build in parallel processing functionality without making that clear–running a test job and monitoring it will tell you if that’s the case. But you should assume R jobs will only need one core unless you have reason to believe otherwise.
3.1.6 What to Read in Linux Essentials
Read the section on specifying file locations.
3.2 Stata
3.2.1 Running Stata Interactively
To run Stata interactively on Linstat, log into the server and type:
xstata
This will give the same graphical user interace as in Windows or on a Mac.
The interface can be somewhat sensitive to network lag. If you find the user interface is slow to respond you may want to write most of your code on Winstat or your own computer and just log in to Linstat to run it. You can also use JupyterLab to do interactive work in Stata.
3.2.2 Running Stata in Batch Mode
To run a Stata job on Linstat briefly so you can monitor what resources it requires, log in and type:
stata -b do
my_do_file
&
where my_do_file
should be replaced by the name of your do file (including .do
in the name is optional).
3.2.3 Submitting Stata Jobs to Slurm
To submit a Stata job to Slurm, log in and type:
ssubmit --cores=C --mem=Mg "stata -b do
my_do_file
"
where C
should be replaced by the number of cores your job will use, M
should be replaced by the number of gigabytes of memory your job will use, and my_do_file
should be replaced by the name of your do file.
Slurm Assistant can craft this command for you.
You may also want to read Converting Stata Loops to Parallel Loops Using Slurm (The Easy Way).
3.2.4 Installing Packages
You can install Stata packages with the standard ssc install
or net
commands. Note that if you installed packages on Winstat you’ll need to install them again, but packages installed using Linstat or LinSilo will also be available in Slurm or SlurmSilo.
In Silo, you can only install packages directly from the Statistical Software Components archive (SSC, not related to the SSCC). If you need to install packages from other locations, often you can install them on Linstat and then move them into the corresponding location on LinSilo using Globus. Otherwise, contact the Help Desk for assistance.
3.2.5 Computing Resources
The SSCC has 64-core Stata MP installed on all our Linux servers. Stata MP automatically parallelizes everything it can, but not all Stata commands can be parallelized. Start out by reserving 64 cores, but pay attention to the email you’ll get when your job finishes. If it tells you your job did not use very much of the CPU time it reserved you can reduce the number of cores you use for similar jobs in the future.
Note that only one server in SlurmSilo has 64 cores, so you may want to only reserve 44 cores so your job can run on other servers.
3.2.6 What to Read in Linux Essentials
You’ll need to read the entire chapter.
3.3 Python
3.3.1 Running Python Interactively
To run Python interactively, we suggest you use JupyterLab. You can run Spyder or PyCharm on Linstat and LinSilo, but they are very sensitive to network lag and performance is generally poor. You can use them or VS Code on Winstat/WinSilo to write Python scripts you will run on Linstat/LinSilo or submit to Slurm/SlurmSilo.
A Python job written as a Jupyter Notebook needs to be converted to a script before you can submit it to Slurm. See Converting Jupyter Notebooks to Scripts.
3.3.2 Running Python in Batch Mode
To run a Python job on Linstat briefly so you can monitor what resources it requires, log in and type:
python
my_python_script.py
&
where my_python_script.py
should be replaced by the name of your Python script.
3.3.3 Submitting Python Jobs to Slurm
To submit a Python job to Slurm, log in and type:
ssubmit --cores=C --mem=Mg "python
my_python_script.py
"
where C
should be replaced by the number of cores your job will use, M
should be replaced by the number of gigabytes of memory your job will use, and my_python_script.py
should be replaced by the name of your Python script.
Slurm Assistant can craft this command for you.
You may also want to read Converting Python Loops to Parallel Loops Using Slurm (The Easy Way).
3.3.4 Installing packages
To install a Python package, run:
pip install --user
package_name
where package_name
should be replaced by the name of the package you want to install. Note that if you installed packages on Winstat you’ll need to install them again, but packages installed using Linstat or LinSilo will also be available in Slurm or SlurmSilo.
The SSCC installs the Anaconda distribution of Python, which includes a large number of useful packages. You can upgrade an existing package by adding the --upgrade
switch.
In Silo, you can only install packages directly from the PyPI archive. If you need to install packages from other locations, often you can install them on Linstat and then move them into the corresponding location on LinSilo using Globus. Otherwise, contact the Help Desk for assistance.
By default pip
will install packages from the SSCC’s local mirror of PyPI, but the mirror does not contain older versions of packages. If you need an older version (notably, as of this writing tensorflow
requires an older version of gast
) add -i https://pypi.org/simple
to your pip command before the name of the package to install.
3.3.5 Computing Resources
While Python itself can only use a single core, many Python functions have components written in C/C++ that can use multiple cores. You’ll need to do a test run and monitor your job’s actual resource usage to determine what to reserve.
3.3.6 What to Read in Linux Essentials
You’ll need to read the entire chapter.
3.4 Julia
3.4.1 Running Julia Interactively
To run Julia interactively, use JupyterLab or type julia
to get the Julia REPL (read-eval-print loop, also known as the command line).
A Julia job written as a Jupyter Notebook needs to be converted to a script before you can submit it to Slurm. See Converting Jupyter Notebooks to Scripts.
3.4.2 Running Julia in Batch Mode
To run a Julia job on Linstat briefly so you can monitor what resources it requires, log in and type:
julia
my_julia_script.jl
&
where my_julia_script.jl
should be replaced by the name of your Julia script.
3.4.3 Submitting Julia Jobs to Slurm
To submit a Python job to Slurm, log in and type:
ssubmit --cores=C --mem=Mg "julia
my_julia_script.jl
"
where C
should be replaced by the number of cores your job will use, M
should be replaced by the number of gigabytes of memory your job will use, and my_julia_script.jl
should be replaced by the name of your Julia script.
Slurm Assistant can craft this command for you.
3.4.4 What to Read in Linux Essentials
You’ll need to read the entire chapter.
3.5 Matlab
3.5.1 Running Matlab Interactively
To run Matlab interactively on Linstat, log into the server and type:
matlab
This will give the same graphical user interface as in Windows or on a Mac.
The interface can be somewhat sensitive to network lag. If you’re seeing poor performance, you can get a web-based version of the Matlab user interface by typing:
matlab-proxy-app
You’ll be given a URL that you can copy into a browser on your own computer. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN.
Alternatively you can run the same application via JupyterLab. (Matlab run through JupyterLab does not let you write Jupyter Notebooks; it just runs the web version of Matlab.)
The Matlab web interface has a button that allows you to shut down the program, but you’ll also need to go back to your Linux session and press Ctrl-c
to fully shut down the application on the server.
3.5.2 Running Matlab in Batch Mode
To run a Matlab job on Linstat briefly so you can monitor what resources it requires, log in and type:
matlab -nodisplay <
my_matlab_script.m
&
where my_matlab_script.m
should be replaced by the name of your Matlab script.
3.5.3 Submitting Matlab Jobs to Slurm
To submit a Matlab job to Slurm, log in and type:
ssubmit --cores=C --mem=Mg "matlab -nodisplay <
my_matlab_script.m
"
where C
should be replaced by the number of cores your job will use, M
should be replaced by the number of gigabytes of memory your job will use, and my_matlab_script.m
should be replaced by the name of your Matlab script.
Slurm Assistant can craft this command for you.
3.5.4 What to Read in Linux Essentials
You’ll need to read the entire chapter.
3.6 SAS
3.6.1 Running SAS Interactively
SAS Studio allows you to run SAS interactively on Linstat or LinSilo. The user interface is not the same as Windows SAS, but it’s easy to use. The interface runs in a web browser on your computer or on WinSilo, making it very responsive, while the computation happens on the server.
To use SAS Studio on Linstat you must be on SSCC’s network. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN. Then open a web browser and go to:
https://ssc.wisc.edu/sas-studio
To run SAS Studio on LinSilo (or LinSiloBig) first log into WinSilo as usual and then use the shortcut under LinSilo in the Programs menu.
You can also run SAS in JupyterLab on Linstat if you want to use Jupyter Notebooks.
3.6.2 Running SAS in Batch Mode
To run a test SAS job on Linstat so you can monitor what resources it requires, log in and type:
sas
my_sas_program.sas
&
where my_sas_program.sas
should be replaced by the name of your SAS program.
3.6.3 Submitting SAS Jobs to Slurm
To submit a SAS job to Slurm, log in and type:
ssubmit --cores=C --mem=Mg "sas
my_sas_program.sas
"
where C
should be replaced by the number of cores your job will use, M
should be replaced by the number of gigabytes of memory your job will use, and my_sas_program.sas
should be replaced by the name of your SAS program.
Slurm Assistant can craft this command for you.
3.6.4 Computing Resources
By default SAS uses just four cores and 2GB of memory. You can tell SAS to use as much memory as it needs (or is available) by adding the -memsize 0
switch to the sas
command.
Unlike almost all other statistical software, SAS does not load the data it uses into memory. Instead, it loads one observation from the input data set, processes it, and then writes it to the output data set. That makes it highly dependent on disk I/O. Modern servers have dozens of cores but just one local disk and one or two network connections to the file server, so running many SAS jobs on the same server will lead to poor performance.
SAS stores temporary data sets on the server’s local disk and permanent data sets on the network file system. Local disk is much faster, but space is limited. You may need to make very large data sets you don’t plan to keep “permanent” just so they’re stored on the network file system and then erase them when you’re done with them. In practice, Linux uses memory as a buffer for both local and network disk, and small to medium data sets may always be available in the buffer and thus very fast.
3.6.5 What to Read in Linux Essentials
If you only plan to use SAS Studio you only need to read the section on specifying file locations. Otherwise you should read the entire chapter.
3.7 JupyterLab
JupyterLab is a programming environment for working with Jupyter Notebooks, which can contain text, code, and the results of running that code in a single convenient file. We have installed “kernels” on Linstat that allow you to write Jupyter Notebooks in Python, Julia, R, Stata, and SAS. JupyterLab is the preferred method for working interactively on Linstat and LinSilo in Python and Julia. Stata, SAS, and R users may be interested in JupyterLab for the Jupyter Notebooks, though RStudio has its own version of Notebooks.
JupytherLab’s user interface runs in a web browser on your computer, making it very responsive, while the code is run on the server. (If you accidentally open a web browser on the server it will be extremely unresponsive, and you should quit and start over.)
JupyterLab also includes a terminal so you can run commands on the server, a basic text editor, and viewers for some common file types like CSV files and images.
You must be on SSCC’s network to use JupyterLab on Linstat. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN. To use JupyterLab on LinSilo you must first log into Silo.
3.7.1 Running JupyterLab
To use JupyterLab, first log into the server you want to use. Then you must set the working directory to the location of the files you want to work with, or a directory above them, using the cd
(change directory) command. We’ll discuss how to do so in the next chapter. JupyterLab can only see files and directories that are underneath the directory it starts in. Then type:
sscc-jupyter
You’ll then see a web address (twice) that you need to copy Be sure to copy the entire address and nothing but the address. If you’re using X-Win32 you can right-click on the first address and choose Copy Link Address
. Paste that in a browser on your computer.
When you’re done using JupyterLab, click File
, Shut Down
so it shuts down the JupyterLab process on the server. Then you can close the browser tab on your computer.
3.7.2 Converting Jupyter Notebooks to Scripts
Before you can submit a job written in a Jupyter Notebook to Slurm you must convert it to a script. You can do so with the following Linux command:
jupyter nbconvert --to script
my_notebook.ipynb
where my_notebook.ipynb
should be replaced by the name of the notebook you want to convert. You can use wildcards to convert multiple notebooks, including *.ipynb
to convert all the notebooks in the current directory.
The script will contain all the code in the notebook plus any markdown text converted into comments.
3.7.3 What to Read in Linux Essentials
You’ll need to read the entire chapter.