If you’re creating a table to help you better understand your data or results, you don’t need to worry about its appearance. But if your goal is to create tables for publication, then getting a table to look just right can take some effort. This is done (mostly) with the collect styles command.
Keep in mind you can make changes to your table after you export it to Word (or whatever you’ll use), so you don’t want to spent unlimited time figuring out how to get Stata to make it perfect, especially at first. But then that part of your workflow is not reproducible.
3.1 Setting Up
Load the ACS data and regress income on female and edu, storing the results in a collection:
clearalluse https://sscc.wisc.edu/sscc/pubs/stata_tables/acscollect: reg income female i.edu
Source | SS df MS Number of obs = 15,158
-------------+---------------------------------- F(5, 15152) = 656.63
Model | 4.5380e+12 5 9.0759e+11 Prob > F = 0.0000
Residual | 2.0943e+13 15,152 1.3822e+09 R-squared = 0.1781
-------------+---------------------------------- Adj R-squared = 0.1778
Total | 2.5481e+13 15,157 1.6811e+09 Root MSE = 37178
-------------------------------------------------------------------------------
income | Coefficient Std. err. t P>|t| [95% conf. interval]
--------------+----------------------------------------------------------------
female | -17536.46 605.1548 -28.98 0.000 -18722.64 -16350.28
|
edu |
High Schoo.. | 7976.849 940.9349 8.48 0.000 6132.503 9821.194
Some College | 14402.38 924.3338 15.58 0.000 12590.58 16214.19
Bachelor's | 32385.91 1054.114 30.72 0.000 30319.72 34452.1
Advanced D.. | 53206.38 1270.507 41.88 0.000 50716.03 55696.73
|
_cons | 25628.12 790.4475 32.42 0.000 24078.75 27177.49
-------------------------------------------------------------------------------
Now turn that into a one-column table like the ones from the last chapter, with predictor (colname) as super rows and coefficients and standard errors (result[_r_b _r_se]) as rows:
collect layout (colname#result[_r_b _r_se])
Collection: default
Rows: colname#result[_r_b _r_se]
Table 1: 21 x 1
---------------------------------
Female |
Coefficient | -17536.46
Std. error | 605.1548
Less Than High School |
Coefficient | 0
Std. error | 0
High School Diploma |
Coefficient | 7976.849
Std. error | 940.9349
Some College |
Coefficient | 14402.38
Std. error | 924.3338
Bachelor's |
Coefficient | 32385.91
Std. error | 1054.114
Advanced Degree |
Coefficient | 53206.38
Std. error | 1270.507
Intercept |
Coefficient | 25628.12
Std. error | 790.4475
---------------------------------
That’s got all the numbers we want and in the right places, but the appearance has a long way to go.
This is an example of a style that applies to the entire table.
3.3 Controlling Row and Column Headers
Other styles can be applied to specific dimensions or even individual tags. For example, collect style header controls how row and column headers are displayed, and the option level(hide) means to hide the labels for the individual levels. That sounds like a great way to get rid of the “Coefficient” and “Std. error” labels we don’t want, but we don’t want to get rid of the predictor labels (“Female”, “Advanced Degree”, etc.) too.
A collect styles command takes a target to act on just like most other Stata commands take a list of variables to act on. The target can be a dimension, a tag, or even an individual cell (specified with interactions). In this case, the target is the result dimension:
-------------------------------
Female | -17536.46
| 605.1548
High School Diploma | 7976.849
| 940.9349
Some College | 14402.38
| 924.3338
Bachelor's | 32385.91
| 1054.114
Advanced Degree | 53206.38
| 1270.507
Intercept | 25628.12
| 790.4475
-------------------------------
Better, but now it’s hard to distinguish the coefficients from the standard errors.
3.4 Formatting Results
You can fix that by putting parentheses around the standard errors. This is done with the collect style cell command, which allows you to control the appearance of individual cells. The sformat() option lets you put text before or after the content of the cell. %s represents the original content, so sformat("(%s)") means “take the content of the cell (%s) and put parentheses around it.”
You only want this to apply to the standard errors, so specify the target result[_r_se]:
--------------------------------
Female | -17536.46
| (605.1548)
High School Diploma | 7976.849
| (940.9349)
Some College | 14402.38
| (924.3338)
Bachelor's | 32385.91
| (1054.114)
Advanced Degree | 53206.38
| (1270.507)
Intercept | 25628.12
| (790.4475)
--------------------------------
The nformat() option lets you set numeric formats. You can use the format %8.1f to display just one digit to the right of the decimal point. (%8.1fc would add commas to the numbers.) This time, we want to apply the change to the entire table:
collect style cell, nformat(%8.1f)collect preview
------------------------------
Female | -17536.5
| (605.2)
High School Diploma | 7976.8
| (940.9)
Some College | 14402.4
| (924.3)
Bachelor's | 32385.9
| (1054.1)
Advanced Degree | 53206.4
| (1270.5)
Intercept | 25628.1
| (790.4)
------------------------------
3.5 Controlling Header Labels
Now let’s consider the row labels. While we can deduce that most of them represent levels of an education variable, that should be explicit.
By default the colname titles are not shown for factor variables. You can turn it on for edu and set the title to the variable label with:
This is clear but ugly. We can apply the stack style to the rows, which will put the title at the top, but only if we get rid of the equals sign “binder” between the title and the level label:
collect style rowstack, nobindercollect preview
---------------------------------
Female | -17536.5
| (605.2)
Educational Attainment |
High School Diploma | 7976.8
| (940.9)
Some College | 14402.4
| (924.3)
Bachelor's | 32385.9
| (1054.1)
Advanced Degree | 53206.4
| (1270.5)
Intercept | 25628.1
| (790.4)
---------------------------------
3.6 Changing Labels
Some of these labels are kind of long. You could fix that by changing the variable and value labels in the data set before you create the table, but it’s also easy to change labels in a collection.
Start by looking at the labels for the edu dimension:
collect labellist edu
Collection: default
Dimension: edu
Label: Educational Attainment
Level labels:
1 Less Than High School
2 High School Diploma
3 Some College
4 Bachelor's
5 Advanced Degree
New labels for levels of a dimension are specified very much like value labels. Start with collect label levels, followed by the name of the dimension whose labels you want to change. Then you give the level name followed by the new label in quotes. End with the modify option since you’re changing existing labels.
Change the label for level 2 of edu to just “High School” with:
collect labellevels edu 2 "High School", modify
We can save a lot more space by changing “Educational Attainment” to just “Education.” The edu dimension has the label “Educational Attainment” and we could change it with collect label dim edu "Education". (Note how this is much like changing a variable label.) But it wouldn’t do us any good, because the “Educational Attainment” that’s actually being used is a level label of colname:
collect labellist colname
Collection: default
Dimension: colname
Label: Covariate names and column names
Level labels:
_cons Intercept
edu Educational Attainment
female Female
This level label can be changed just like changing the level labels for the edu dimension, but with the interesting twist that the level name is edu rather than a number:
collect labellevels colname edu "Education", modifycollect preview
----------------------------
Female | -17536.5
| (605.2)
Education |
High School | 7976.8
| (940.9)
Some College | 14402.4
| (924.3)
Bachelor's | 32385.9
| (1054.1)
Advanced Degree | 53206.4
| (1270.5)
Intercept | 25628.1
| (790.4)
----------------------------
3.7 Stars for Significance
Now let’s apply stars to the coefficients that are statistically significant. This is done with the collect stars command. You then give the result the stars are based on, usually _r_p, followed by a list of threshold numbers and the text that should be applied if _r_p is below that value. The attach() option specifies which result should get the stars, normally _r_b for a regression.
Suppose you want to put three stars after coefficients with a p-value below .001, two stars after those with a p-value below .01, and one star after those with a p-value below .05. However, to make all the numbers line up, actually put three stars, two stars followed by a space, and one star followed by two spaces respectively. Coefficients that don’t get any stars need three spaces to line up, so you can define a threshold of 1 for them.
--------------------------------
Female | -17536.5***
| (605.2)
Education |
High School | 7976.8***
| (940.9)
Some College | 14402.4***
| (924.3)
Bachelor's | 32385.9***
| (1054.1)
Advanced Degree | 53206.4***
| (1270.5)
Intercept | 25628.1***
| (790.4)
--------------------------------
If you want the standard errors to align with the coefficient numbers rather than the stars, add three spaces to them. Remember that we used collect style cell and the sformat() option to add parentheses to the standard errors, and you can use it again to add parentheses and spaces:
--------------------------------
Female | -17536.5***
| (605.2)
Education |
High School | 7976.8***
| (940.9)
Some College | 14402.4***
| (924.3)
Bachelor's | 32385.9***
| (1054.1)
Advanced Degree | 53206.4***
| (1270.5)
Intercept | 25628.1***
| (790.4)
--------------------------------
3.8 Table Titles
This table could use a title, which you can set with collect title. Unless you’re feeling creative, call it “Model 1”:
collect title Model 1collect preview
Model 1
--------------------------------
Female | -17536.5***
| (605.2)
Education |
High School | 7976.8***
| (940.9)
Some College | 14402.4***
| (924.3)
Bachelor's | 32385.9***
| (1054.1)
Advanced Degree | 53206.4***
| (1270.5)
Intercept | 25628.1***
| (790.4)
--------------------------------
3.9 Text Formatting
You can control the appearance of the title with collect style title. The font() option lets you choose a font and then apply further options to it. To just make the title bold without changing the font, use font(, bold). The comma tells Stata that you’re going straight to options without setting a font.
You won’t see the bold font in the web book, and it’s pretty subtle in Stata’s results window. But you’ll see it clearly when you export the table.
collect style title, font(, bold)collect preview
Model 1
--------------------------------
Female | -17536.5***
| (605.2)
Education |
High School | 7976.8***
| (940.9)
Some College | 14402.4***
| (924.3)
Bachelor's | 32385.9***
| (1054.1)
Advanced Degree | 53206.4***
| (1270.5)
Intercept | 25628.1***
| (790.4)
--------------------------------
Stata’s results window uses a monospace font (all characters have the same width). This keeps things lined up. To make all the cells in your table use a monospace font when they’re exported, like courier, use the font() option with collect style cell (and no target so it applies to everything).
collect style cell, font(courier)
3.10 Borders
Finally, let’s get rid of the separator line between the row labels and the numbers. This is done with the border() option of collect style cell. The separator is a border on the right of the label cells, and you can get rid of it by setting the right border pattern to nill:
Model 1
------------------------------
Female -17536.5***
(605.2)
Education
High School 7976.8***
(940.9)
Some College 14402.4***
(924.3)
Bachelor's 32385.9***
(1054.1)
Advanced Degree 53206.4***
(1270.5)
Intercept 25628.1***
(790.4)
------------------------------
Now it’s time to export the results to the format you want to use it in. Options include Word, Excel, PDF, HTML, LaTeX, Markdown, and plain text. Do so with the collect export command, followed by the file name. Stata will deduce the file type you want from the file extension (.docx for Word, .xlsx for Excel, etc.). Don’t forget to add the replace option so you can run your do file more than once.
(collection default exported to file model1.docx)
(collection default exported to file model1.xlsx)
(collection default exported to file model1.html)
You’ll probably need to tweak the results a bit: the Word table could be narrower, for example, while the Excel columns need to be wider. But you won’t have to do anything (like retype numbers) that would introduce errors.
Exercise
The commands below create a table of t-test results (you can also think of it as a table of summary statistics that happens to include whether the differences between the groups are significant). Make it pretty–but don’t try to add stars for significance. (Solution)
You may want to run collect label list result to see what results ttest generates.
collect diff=(r(mu_1) - r(mu_2)) means that in addition to the usual results from running a t-test, the collect command should calculate the difference in the means between the two populations and store it in the result dimension as diff. You can use similar code to calculate any values you need.
What makes this table a little complicated to lay out (and why I did it for you) is that we want some of the levels of the result dimension to be rows and some to be columns. The solution is to create new row and col dimensions and put the appropriate levels of result in each.