This workshop will introduce you to Stata and its core concepts. It has three goals:
First, to prepare you for the rest of the Stata track of SSCC's Data Science Tools for Research curriculum. All the other workshops will rely on the concepts and syntax you'll learn in this workshop.
Second, to prepare you to excel in research methods and applied statistics courses that use Stata. You'll go in already knowing how Stata works and why it does what it does, so you can focus on learning the material for the course.
Third, to teach you how to make your Stata work reproducible right from the beginning, so you never have to unlearn any bad habits.
There are two different approaches one can take to Stata. One is to use it as an interactive tool: you start Stata, load your data, and start typing or clicking on commands. This can be a good way to explore your data, figure out what you want to do, and check that your programs worked properly. It can also be useful when you're trying to learn something new because you get immediate feedback. However, interactive work cannot be easily or reliably reproduced, or modified if you change your mind. It's also very difficult to recover from mistakes—there's no "undo" command in Stata.
The other approach is to treat Stata as a programming language. In this approach you write your programs, called do files, and run them. A do file contains the same commands you'd type in interactive Stata, but since they're written in a permanent file they can be debugged or modified and then rerun at will. They also serve as an exact record of how you obtained your results—a lab notebook for the social scientist. Any work you intend to publish, present or rely on in any way should be done using do files. Thus this workshop will for the most part ignore Stata's graphical user interface and prepare you to write do files.
To get the most out of Introduction to Stata you need to be an active participant. Open Stata, and type in and run the example code yourself. This will help you retain more, and ensure you get all the details right—Stata is always happy to tell you when you're wrong. Do the exercises (some of them are straightforward applications of what you just learned; others will require more creativity). Using Stata is not something you read and understand—it's a skill you must practice.
Introduction to Stata includes the following sections:
- Using Stata
- Structure of a Stata Data Set
- Elements of Stata Syntax
- Do Files
- Creating and Changing Variables
Running Stata at the SSCC
The SSCC makes Stata available in our computer labs, on our Winstat servers, Linstat servers, Condor servers, and in the Silo secure computing enclave. For details about the capabilities of the SSCC's servers see Computing Resources at the SSCC. UW-Madison students, faculty, and staff can also download Stata from the Campus Software Library and install it on their own computers.
Most people will get all the computing power they need from Stata SE running on their laptop, a lab computer, or Winstat (our easy-to-use Windows-based servers), so we won't talk about how to use the more powerful servers. But here are some reasons why you might use them:
Your Stata job is taking a long time. Stata MP allows Stata to use multiple cores, which will speed up many processes dramatically. You can use Stata MP on Winstat, and it's the default on Linstat and Condor. Linstat has more cores than Winstat. Running Stata/MP at the SSCC has details.
Your Stata job is still taking a long time and you don't want to babysit it. On Linstat you can start a Stata job "in the background" and it will keep running until it's done, even if you log out.
Your Stata job needs more memory. Winstat has much more memory than almost all laptops, and Linstat has even more.
You need to run a lot of Stata jobs. Condor allows you to submit Stata jobs to a pool of servers.
You need to work with HIPAA data or other sensitive data. Silo is SSCC's secure computing enclave, combining secure data storage and secure servers that are very similar to Winstat and Linstat. You can connect to it from any location using your own computer, but all the data and computation stay in the secure environment.
To start Stata on Winstat or another Windows computer, type Stata in the search box next to the Windows Logo button, or click on the button and find Stata in the programs list. On Linstat, type xstata.
Stata's User Interface
When you start up Stata, the first thing you'll see is the main user interface window:
The big central section is the Results window, where you'll see the results of the commands you run. Under it is the Command window, where you'll type those commands (when you're not running a do file).
On the left is the History window, which contains a history of the commands you've run. Click once on a command to paste it back into the Command window for editing. Double-click on a command to run it again. You can also press Page Up when you're in the Command window to recall past commands. Right-click on a command or block of commands to copy it into the clipboard or send it to the Do File editor. This allows you to take something you've done interactively and turn it into part of a do file.
Beneath the History window Stata displays the working directory. This is where Stata will save files if you don't specify another location.
On the right is the Variables windows, which contains a list of the variables in the current data set. Click once on a variable name to select it, and information about the variable will be shown in the Properties window on the bottom right. Click twice, and the variable name will be pasted into the Command window. You can also start typing a variable name in the Command window and press Tab, and Stata will either complete the variable name or give you a list of variables that match what you've typed so far.
The Properties window also shows properties of the data set. One to keep an eye on is the size, or how much memory it requires. Stata must load your entire data set into memory. Modern computers have so much memory that most Stata users never have to worry about it, but big data users must make sure they don't run out. If you try to use more memory than your computer has, the operating system will use disk space as memory and Stata will become so slow that it's practically unusable.
If a command is running, the button on the far right of the top toolbar will turn stop-sign red. Clicking it will tell Stata to stop what it's doing—though it may take some time to notice. Pressing q will do the same thing.
The menus above the toolbar give you access to most of Stata's commands and a graphical user interface for running them, but you'll work much faster if you type them instead. There are some situations where the menus are useful. Importing data from non-Stata formats is one, because the graphical user interface will give you a preview of the data you can use to figure out the correct options for importing it. Making graphs is another, because there are so many options for graphs that setting them using the graphical user interface is a good alternative to memorizing them all. When you do something using the menus, Stata will craft a command based on what you chose and add it to the history just as if you'd typed it. You can then paste the command into a do file to make it reproducible.
Next: Structure of a Data Set
Last Revised: 5/27/2020