2  Program Information

In this chapter we’ll discuss how to run the most popular programs at the SSCC.

This does not cover everything that’s installed on the servers: see our software database and the additional list of Biomedical Research Software on Silo. If you plan to use other software we presume you know how to run it, but you will want to read the Linux chapter and the Slurm chapter if you will submit jobs to Slurm.

You are welcome to install additional software on the SSCC’s Linux servers if you can install it in your home directory without needing to use sudo. Otherwise, contact the Help Desk for assistance.

Note that JupyterLab has its own section, though it’s used to run Python, Julia, R, Stata, or SAS interactively.

2.1 R

2.1.1 Running R Interactively

RStudio Server allows you to run R interactively on Linstat or LinSilo (including LinSiloBig) with the same user interface you’re used to in Windows or on a Mac. The user interface runs in a web browser on your computer, making it very responsive, while the computation happens on the server.

To use RStudio Server on Linstat you must be on SSCC’s network. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN. Then open a web browser and go to:

https://ssc.wisc.edu/rstudio-server

To use RStudio Server on LinSilo or LinSiloBig you must first log into Silo. Then go to the programs menu and under LinSilo you’ll find shortcuts for RStudio Server.

You can also run R in JupyterLab.

2.1.2 Running R in Batch Mode

To run an R job on Linstat briefly so you can monitor what resources it requires, go to the Terminal tab in RStudio Server and type:

R CMD BATCH --no-savemy_R_script.R&

where my_R_script.R should be replaced by the name of your R script.

2.1.3 Submitting R Jobs to Slurm

To submit an R job to Slurm, go to the Terminal tab in RStudio Server and type:

ssubmit --cores=C --mem=Mg "R CMD BATCH --no-savemy_R_script.R"

where C should be replaced by the number of cores your job will use, M should be replaced by the number of gigabytes of memory your job will use, and my_R_script.R should be replaced by the name of your R script.

You may also want to read Converting R Loops to Parallel Loops Using Slurm (The Easy Way).

2.1.4 Installing Packages

You can install packages using the standard install.packages() function. Do so in an interactive session using RStudio Server (or the R command line if you prefer). You only need to install a package once, so do not include calls to install.packages() in your research scripts. Attempts to install packages in batch mode or in jobs submitted to Slurm will usually fail. Note that if you installed packages on Winstat you’ll need to install them again on the Linux file system, but packages installed using Linstat or LinSilo will be available in Slurm or SlurmSilo respectively.

In Silo, you can only install packages directly from CRAN and Bioconductor. If you need to install packages from other locations, often you can install them on Linstat and then move them into the corresponding location on LinSilo using Globus. Otherwise, contact the Help Desk for assistance.

The SSCC updates R before each semester (in August and January). You will need to reinstall any packages you install after each update, to ensure you have a version that is compatible with the current version of R.

2.1.5 Computing Resources

Base R uses just one core, though packages exist that allow it to use multiple cores for parallel processing. Sometimes packages will build in parallel processing functionality without making that clear–running a test job and monitoring it will tell you if that’s the case. But you should assume R jobs will only need one core unless you have reason to believe otherwise.

2.1.6 What to Read in Linux Essentials

Read the section on specifying file locations.

2.2 Stata

2.2.1 Running Stata Interactively

To run Stata interactively on Linstat, log into the server and type:

xstata

This will give the same graphical user interace as in Windows or on a Mac.

The interface can be somewhat sensitive to network lag. If you find the user interface is slow to respond you may want to write most of your code on Winstat or your own computer and just log in to Linstat to run it, or use JupyterLab.

2.2.2 Running Stata in Batch Mode

To run a Stata job on Linstat briefly so you can monitor what resources it requires, log in and type:

stata -b domy_do_file&

where my_do_file should be replaced by the name of your do file (including .do in the name is optional).

2.2.3 Submitting Stata Jobs to Slurm

To submit a Stata job to Slurm, log in and type:

ssubmit --cores=C --mem=Mg "stata -b domy_do_file"

where C should be replaced by the number of cores your job will use, M should be replaced by the number of gigabytes of memory your job will use, and my_do_file should be replaced by the name of your do file.

You may also want to read Converting Stata Loops to Parallel Loops Using Slurm (The Easy Way).

2.2.4 Installing Packages

You can install Stata packages with the standard ssc install or net commands. Note that if you installed packages on Winstat you’ll need to install them again, but packages installed using Linstat or LinSilo will also be available in Slurm or SlurmSilo.

In Silo, you can only install packages directly from the Statistical Software Components archive (SSC, not related to the SSCC). If you need to install packages from other locations, often you can install them on Linstat and then move them into the corresponding location on LinSilo using Globus. Otherwise, contact the Help Desk for assistance.

2.2.5 Computing Resources

The SSCC has 64-core Stata MP installed on all our Linux servers. Stata MP automatically parallelizes everything it can, but not all Stata commands can be parallelized. Start out by reserving 64 cores, but pay attention to the email you’ll get when your job finishes. If it tells you your job did not use very much of the CPU time it reserved you can reduce the number of cores you use for similar jobs in the future.

Note that only one server in SlurmSilo has 64 cores, so you may want to only reserve 44 cores so your job can run on other servers.

2.2.6 What to Read in Linux Essentials

You’ll need to read the entire chapter.

2.3 Python

2.3.1 Running Python Interactively

To run Python interactively, we suggest you use JupyterLab. You can run Spyder or PyCharm on Linstat and LinSilo, but they are very sensitive to network lag and performance is generally poor. You can use them on Winstat/WinSilo to write Python scripts you will run on Linstat/LinSilo or submit to Slurm/SlurmSilo.

A Python job written as a Jupyter Notebook needs to be converted to a script before you can submit it to Slurm. See Converting Jupyter Notebooks to Scripts.

2.3.2 Running Python in Batch Mode

To run a Python job on Linstat briefly so you can monitor what resources it requires, log in and type:

pythonmy_python_script.py&

where my_python_script.py should be replaced by the name of your Python script.

2.3.3 Submitting Python Jobs to Slurm

To submit a Python job to Slurm, log in and type:

ssubmit --cores=C --mem=Mg "pythonmy_python_script.py"

where C should be replaced by the number of cores your job will use, M should be replaced by the number of gigabytes of memory your job will use, and my_python_script.py should be replaced by the name of your Python script.

You may also want to read Converting Python Loops to Parallel Loops Using Slurm (The Easy Way).

2.3.4 Installing packages

To install a Python package, run:

pip install --userpackage_name

where package_name should be replaced by the name of the package you want to install. Note that if you installed packages on Winstat you’ll need to install them again, but packages installed using Linstat or LinSilo will also be available in Slurm or SlurmSilo.

The SSCC installs the Anaconda distribution of Python, which includes a large number of useful packages. You can upgrade an existing package by adding the --upgrade switch.

2.3.5 Computing Resources

While Python itself can only use a single core, many Python functions have components written in C/C++ that can use multiple cores. You’ll need to do a test run and monitor your job’s actual resource usage to determine what to reserve.

2.3.6 What to Read in Linux Essentials

You’ll need to read the entire chapter.

2.4 Julia

2.4.1 Running Julia Interactively

To run Julia interactively, use JupyterLab.

A Julia job written as a Jupyter Notebook needs to be converted to a script before you can submit it to Slurm. See Converting Jupyter Notebooks to Scripts.

2.4.2 Running Julia in Batch Mode

To run a Julia job on Linstat briefly so you can monitor what resources it requires, log in and type:

juliamy_julia_script.jl&

where my_julia_script.jl should be replaced by the name of your Julia script.

2.4.3 Submitting Julia Jobs to Slurm

To submit a Python job to Slurm, log in and type:

ssubmit --cores=C --mem=Mg "juliamy_julia_script.jl"

where C should be replaced by the number of cores your job will use, M should be replaced by the number of gigabytes of memory your job will use, and my_julia_script.jl should be replaced by the name of your Julia script.

2.4.4 What to Read in Linux Essentials

You’ll need to read the entire chapter.

2.5 Matlab

2.5.1 Running Matlab Interactively

To run Matlab interactively on Linstat, log into the server and type:

matlab

This will give the same graphical user interface as in Windows or on a Mac.

The interface can be somewhat sensitive to network lag. If you’re seeing poor performance, you can get a web-based version of the Matlab user interface by typing:

matlab-proxy-app

You’ll be given a URL that you can copy into a browser on your own computer. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN.

Alternatively you can run the same application via JupyterLab. (Matlab run through JupyterLab does not let you write Jupyter Notebooks; it just runs the web version of Matlab.)

The Matlab web interface has a button that allows you to shut down the program, but you’ll also need to go back to your Linux session and press Ctrl-c to fully shut down the application on the server.

2.5.2 Running Matlab in Batch Mode

To run a Matlab job on Linstat briefly so you can monitor what resources it requires, log in and type:

matlab -nodisplay <my_matlab_script.m&

where my_matlab_script.m should be replaced by the name of your Matlab script.

2.5.3 Submitting Matlab Jobs to Slurm

To submit a Matlab job to Slurm, log in and type:

ssubmit --cores=C --mem=Mg "matlab -nodisplay <my_matlab_script.m"

where C should be replaced by the number of cores your job will use, M should be replaced by the number of gigabytes of memory your job will use, and my_matlab_script.m should be replaced by the name of your Matlab script.

2.5.4 What to Read in Linux Essentials

You’ll need to read the entire chapter.

2.6 SAS

2.6.1 Running SAS Interactively

SAS Studio allows you to run SAS interactively on Linstat or LinSilo. The user interface is not the same as Windows SAS, but it’s easy to use. The interface runs in a web browser on your computer or on WinSilo, making it very responsive, while the computation happens on the server.

To use SAS Studio on Linstat you must be on SSCC’s network. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN. Then open a web browser and go to:

https://ssc.wisc.edu/sas-studio

To run SAS Studio on LinSilo (or LinSiloBig) first log into WinSilo as usual and then use the shortcut under LinSilo in the Programs menu.

You can also run SAS in JupyterLab on Linstat if you want to use Jupyter Notebooks.

2.6.2 Running SAS in Batch Mode

To run a test SAS job on Linstat so you can monitor what resources it requires, log in and type:

sasmy_sas_program.sas&

where my_sas_program.sas should be replaced by the name of your SAS program.

2.6.3 Submitting SAS Jobs to Slurm

To submit a SAS job to Slurm, log in and type:

ssubmit --cores=C --mem=Mg "sasmy_sas_program.sas"

where C should be replaced by the number of cores your job will use, M should be replaced by the number of gigabytes of memory your job will use, and my_sas_program.sas should be replaced by the name of your SAS program.

2.6.4 Computing Resources

By default SAS uses just four cores and 2GB of memory. You can tell SAS to use as much memory as it needs (or is available) by adding the -memsize 0 switch to the sas command.

Unlike almost all other statistical software, SAS does not load the data it uses into memory. Instead, it loads one observation from the input data set, processes it, and then writes it to the output data set. That makes it highly dependent on disk I/O. Modern servers have dozens of cores but just one local disk and one or two network connections to the file server, so running many SAS jobs on the same server will lead to poor performance.

SAS stores temporary data sets on the server’s local disk and permanent data sets on the network file system. Local disk is much faster, but space is limited. You may need to make very large data sets you don’t plan to keep “permanent” and then erase them when you’re done with them. In practice, Linux uses memory as a buffer for both local and network disk, and small to medium data sets may always be available in the buffer and thus very fast.

2.6.5 What to Read in Linux Essentials

If you only plan to use SAS Studio you only need to read the section on specifying file locations. Otherwise you should read the entire chapter.

2.7 JupyterLab

JupyterLab is a programming environment for working with Jupyter Notebooks, which can contain text, code, and the results of running that code in a single convenient file. We have installed “kernels” on Linstat that allow you to write Jupyter Notebooks in Python, Julia, R, Stata, and SAS. JupyterLab is the preferred method for working interactively on Linstat and LinSilo in Python and Julia. Stata, SAS, and R users may be interested in JupyterLab for the Jupyter Notebooks, though RStudio has its own version of Notebooks.

JupytherLab’s user interface runs in a web browser on your computer, making it very responsive, while the code is run on the server. (If you accidentally open a web browser on the server it will be extremely unresponsive, and you should quit and start over.)

JupyterLab also includes a terminal so you can run commands on the server, a basic text editor, and viewers for some common file types like CSV files and images.

You must be on SSCC’s network to use JupyterLab on Linstat. If you are outside the Sewell Social Sciences Building, or on the wireless network in the building, you need to first connect to the SSCC network using VPN. To use JupyterLab on LinSilo you must first log into Silo.

2.7.1 Running JupyterLab

To use JupyterLab, first log into the server you want to use. Then you must set the working directory to the location of the files you want to work with, or a directory above them, using the cd (change directory) command. We’ll discuss how to do so in the next chapter. JupyterLab can only see files and directories that are underneath the directory it starts in. Then type:

sscc-jupyter

You’ll then see a web address (twice) that you need to copy Be sure to copy the entire address and nothing but the address. If you’re using X-Win32 you can right-click on the first address and choose Copy Link Address. Paste that in a browser on your computer.

When you’re done using JupyterLab, click File, Shut Down so it shuts down the JupyterLab process on the server. Then you can close the browser tab on your computer.

2.7.2 Converting Jupyter Notebooks to Scripts

Before you can submit a job written in a Jupyter Notebook to Slurm you must convert it to a script. You can do so with the following Linux command:

jupyter nbconvert --to scriptmy_notebook.ipynb

where my_notebook.ipynb should be replaced by the name of the notebook you want to convert. You can use wildcards to convert multiple notebooks, including *.ipynb to convert all the notebooks in the current directory.

The script will contain all the code in the notebook plus any markdown text converted into comments.

2.7.3 What to Read in Linux Essentials

You’ll need to read the entire chapter.