2  Computing Resources

In this chapter we’ll briefly introduce the different computing resources the SSCC makes available, talk about the kinds of jobs that should be run on each, and then discuss how to learn to use them. Server specifications are at the end of the chapter for those who are interested in the details.

2.1 What SSCC Makes Available

The SSCC has two computing environments: a regular computing environment, and a computing environment called Silo with special security controls for working with very sensitive data. The two environments have the same kinds of resources available; the Silo version of each has “Silo” in its name.

2.1.1 Winstat/WinSilo

Winstat and WinSilo are the SSCC’s general-purpose Windows computing clusters. They consist of Windows Remote Desktop Servers that you can log into from any location to run a wide variety of statistical and other software with full access to SSCC storage.

Winstat is intended for interactive work and running short jobs. Their computing power is comparable to a good laptop. Winstat sessions are automatically closed after 24 hours.

WinSilo is used for both interactive work and bigger jobs, and is comparable to Winstat for Big Jobs.

2.1.2 Winstat for Big Jobs/WinSilo

On Winstat for Big Jobs or WinSilo you can use up to 24 cores and 128GB of memory in a familiar Windows environment. You can also start a job and then disconnect from your session, and your job will keep running for as long as it needs to. (Keep an eye on the scheduled monthly downtime. The date of the next downtime is on the desktop background.).

Be sure to sign out of Winstat for Big Jobs or WinSilo when you’re not using it. Idle sessions continue to use resources, and enough of them will slow down the server for everyone.

2.1.3 Linstat/LinSilo

Linstat and LinSilo are the SSCC’s clusters of interactive Linux servers. Two Linstat servers (Linstat1-2) have 36 cores each and you can use up to 500GB of memory. The other two Linstat servers (Linstat7-8) have 48 cores and a small T4 GPU each, and you can use up to 500GB of memory. The LinSilo servers have 44 cores and you can use up to 250GB of memory, while LinSiloBig has 80 cores and you can use up to 500GB of memory.

You can log into Linstat or Linsilo directly, or use web-based interfaces like JupyterLab, RStudio Server, and SAS Server. (To use LinSilo you first log into Silo and connect from there.) Linstat and LinSilo are meant for interactive work and short jobs. You’ll also use them to develop code you’ll submit to Slurm.

2.1.4 Slurm/SlurmSilo

Slurm is a powerful system for scheduling and managing computing jobs. The SSCC has Slurm clusters in both our regular environment (Slurm) and in Silo (SlurmSilo). You can submit jobs to Slurm from Linstat and to SiloSlurm from LinSilo. When you submit your job you’ll tell Slurm how many cores and how much memory it needs, and you’ll get exclusive use of those resources while your job is running.

Slurm consists of more than 40 servers with a total of over 5,000 cores. Most of the servers have 128 cores each, and some of them have 1TB (1,024Gb) of memory. Two of them have two powerful A100 GPUs, one with 80GB of memory in each and one with 40GB of memory in each.

SlurmSilo consists of seven servers with a total of 364 cores. Five are identical to LinSilo and one is identical to LinSiloBig. The last server has 64 cores, 512GB of memory, and two powerful L40S GPUs.

For more details about the Slurm servers, see the Cluster Specifications.

2.2 What Resources Should I Use?

To figure out what resources you should use for your work, start by asking yourself why you don’t just do it on your laptop:

2.2.1 My laptop is too slow

If your laptop is just generally old and slow (we understand grad student budgets) you can log into Winstat for better performance.

If the reason it’s slow is that your project involves big data sets or computationally-intensive methods, Linstat will be faster. If what you’re doing can take advantage of multiple cores, it will be much faster.

2.2.2 I need more memory

On Winstat for Big Jobs you can use up to 128GB of memory, depending on how much other people are using.

On Linstat you can use up to 500GB of memory, depending on how much other people are using.

On Slurm you can reserve up to 1,000GB of memory, and you’ll get it for sure.

On WinSilo you can use up to 128GB of memory, depending on how much other people are using.

On LinSilo you can use up to 250GB of memory and on LinSiloBig you can use up to 500GB of memory, depending on how much other people are using.

On SlurmSilo you can reserve up to 750GB of memory, and you’ll get it for sure.

2.2.3 I need to run long jobs

If you need to run long jobs, you don’t want to have to stay connected to the server the whole time. (We can define a “long job” as one that’s still running when you’re ready to log off and do something else.)

If your long job is not very computationally intensive, you can start it on Winstat for Big Jobs and then disconnect from the session, and the job will keep running.

If your long job is computationally intensive, use Slurm.

2.2.4 My job needs a lot of cores

You can use up to 48 cores on Linstat, depending on how many other people are using.

You can use up to 128 cores in Slurm, or 128 cores per server if your job can use multiple servers.

You can use up to 80 cores in SlurmSilo, or 44 cores per server if your job can use multiple servers.

2.2.5 I need to run a bunch of jobs

This is what Slurm was made for. Submit them all to Slurm and it will try to run them all at the same time on as many servers as they need. If it can’t run them all, the excess jobs will wait in the queue and then run as soon as other jobs finish.

2.2.6 I need a GPU

The T4 GPUs in Linstat7 and Linstat8 are old and don’t have a lot of memory, but they can still speed up jobs that benefit from a GPU dramatically. Use them for small GPU jobs or to test and develop GPU code before submitting it to Slurm.

Use the GPUs in Slurm or SlurmSilo for larger machine learning tasks or anything else that can benefit from GPUs.

2.2.7 I need to run jobs that use multiple servers

Use Slurm for jobs that parallelize their work across multiple servers rather than just the cores in one server–it was made for this too.

2.2.8 I need to run big jobs, but I have an instructional account

The Social Science Computing Cooperative has a research mission and an instructional support mission, and some SSCC accounts are meant for class work rather than research. These accounts do not have access to Linstat, Slurm, or Silo, so you’re limited to Winstat for Big Jobs.

If you do not have a Z: drive (Linux home directory) when you log into Winstat, you have an instructional account. That’s probably because:

  • You were given an SSCC account for a specific class

  • You are a graduate student in a department that is in the Social Science Division of L&S but is not a member of the Cooperative, like Political Science or the LaFollette Institute.

  • You are a graduate student in the Department of Economics who has not yet had a need for the Linux servers.

Graduate students in Economics can request an SSCC account meant for research with the approval of an account sponsor. This can be any faculty member in Economics. We suggest speaking with them before filling out the form.

Researchers in other departments can join the SSCC by becoming Individual Members of the Cooperative.

2.3 Learning to Use the Resources You Need

Now that you’ve identified the computing resources you need to use, this section will walk you through what you’ll want to read to learn to use them.

2.3.1 Winstat for Big Jobs/WinSilo

Using Winstat will show you how to log into both Winstat and Winstat for Big Jobs. Once you’re in, it’s a familiar Windows environment. Just keep in mind that Winstat for Big Jobs has a fraction of the computing power available on Linstat or through Slurm. If you need to run more than a handful of jobs that are big enough to need Winstat for Big Jobs, it’s probably worth learning how to send them to Slurm.

Using Silo will teach you how to log in to WinSilo. Once you’re in, it’s the same as Winstat for Big Jobs except for the security measures needed for sensitive data.

If you won’t use Linux at all, you don’t need to read anything else in this book.

2.3.2 Linstat/LinSilo

First go the Program Information chapter and read the section for the program(s) you want to run (R, Stata, Python, etc.). Since you won’t be using Slurm you can skip the parts that talk about it.

Then go to the Linux Essentials chapter. You’ll need to read about how to specify file locations in Linux. The Program Information chapter will tell you if you also need to read about how to log into Linstat directly.

2.3.3 Slurm/SlurmSilo

First go the Program Information chapter and read the section for the program(s) you want to run (R, Stata, Python, etc.). If you’re sure you won’t be running anything interactively you can skip the parts that talk about interactive work.

Then read the Linux Essentials chapter, which will teach you how to specify file locations in Linux and how to log into Linstat or LinSilo so you can submit jobs from there to Slurm or SlurmSilo. (RStudio Server users can skip the part about logging in and use the terminal it provides.)

Finally, read the Slurm chapter to learn how to submit jobs to Slurm.

2.4 Server Specifications

The following list describes the SSCC’s research computing servers. For a discussion of how they’re intended to be used and how to learn to use them, start at the beginning of this chapter. Servers with “Silo” in their name are in the SSCC’s special secure computing environment for working with very sensitive data (Silo).

2.4.1 Winstat

The Winstat servers are virtual machines running Windows with 4 Intel cores. Each user may use up to 17GB of memory. Both the cores and memory are shared with other users.

2.4.2 Winstat for Big Jobs & WinSilo

Winstat for Big Jobs and WinSilo are virtual machines running Windows with 24 Intel cores. Each user may use up to 128GB of memory. Both the cores and memory are shared with other users.

2.4.3 Linstat

Linstat consists of four servers running Linux. Two of them (Linstat7-8) have 48 Intel cores and an NVidia T4 GPU. Two of them (Linstat1-2) have 36 Intel cores (and no GPU). Each user may use up to 500GB of memory. Both the cores and memory are shared with other users.

2.4.4 LinSilo

LinSilo consists of two servers running Linux with 44 Intel cores. Each user may use up to 250GB of memory. Both the cores and memory are shared with other users. SMPH researchers have priority on these servers.

2.4.5 LinSiloBig

LinSiloBig consists of one server running Linux with 80 Intel cores. Each user may use up to 500GB of memory. Both the cores and memory are shared with other users. SMPH researchers have priority on this server.

2.4.6 Slurm & SlurmSilo

For information on the SSCC’s Slurm clusters, see the Cluster Specifications section of the Slurm chapter.

2.4.7 Downtime

All of the SSCC’s servers except for Slurm and SlurmSilo are taken offline, patched, and rebooted from 6:00AM-8:00AM on the Wednesday after the third Tuesday of the month (the week after Microsoft’s “Patch Tuesday”). This will interrupt any jobs that are running on the servers at the time. On Winstat and WinSilo, the date of the next downtime is given in the background of the desktop.

Slurm and SlurmSilo servers are patched and rebooted between jobs so no jobs are interrupted. You may occasionally see a server in a “draining” state, meaning that it is finishing its current jobs but not taking new ones so it can be patched and rebooted once they’re done. This is one of the reasons long jobs should be sent to Slurm.