3  Appearance

If you’re creating a table to help you better understand your data or results, you don’t need to worry about its appearance. But if your goal is to create tables for publication, then getting a table to look just right can take some effort. This is done (mostly) with the collect styles command.

Keep in mind you can make changes to your table after you export it to Word (or whatever you’ll use), so you don’t want to spent unlimited time figuring out how to get Stata to make it perfect, especially at first. But then that part of your workflow is not reproducible.

3.1 Setting Up

Load the ACS data and regress income on female and edu, storing the results in a collection:

clear all
use https://sscc.wisc.edu/sscc/pubs/stata_tables/acs
collect: reg income female i.edu




      Source |       SS           df       MS      Number of obs   =    15,158
-------------+----------------------------------   F(5, 15152)     =    656.63
       Model |  4.5380e+12         5  9.0759e+11   Prob > F        =    0.0000
    Residual |  2.0943e+13    15,152  1.3822e+09   R-squared       =    0.1781
-------------+----------------------------------   Adj R-squared   =    0.1778
       Total |  2.5481e+13    15,157  1.6811e+09   Root MSE        =     37178

-------------------------------------------------------------------------------
       income | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
--------------+----------------------------------------------------------------
       female |  -17536.46   605.1548   -28.98   0.000    -18722.64   -16350.28
              |
          edu |
High Schoo..  |   7976.849   940.9349     8.48   0.000     6132.503    9821.194
Some College  |   14402.38   924.3338    15.58   0.000     12590.58    16214.19
  Bachelor's  |   32385.91   1054.114    30.72   0.000     30319.72     34452.1
Advanced D..  |   53206.38   1270.507    41.88   0.000     50716.03    55696.73
              |
        _cons |   25628.12   790.4475    32.42   0.000     24078.75    27177.49
-------------------------------------------------------------------------------

Now turn that into a one-column table like the ones from the last chapter, with predictor (colname) as super rows and coefficients and standard errors (result[_r_b _r_se]) as rows:

collect layout (colname#result[_r_b _r_se])

Collection: default
      Rows: colname#result[_r_b _r_se]
   Table 1: 21 x 1

---------------------------------
Female                |          
  Coefficient         | -17536.46
  Std. error          |  605.1548
Less Than High School |          
  Coefficient         |         0
  Std. error          |         0
High School Diploma   |          
  Coefficient         |  7976.849
  Std. error          |  940.9349
Some College          |          
  Coefficient         |  14402.38
  Std. error          |  924.3338
Bachelor's            |          
  Coefficient         |  32385.91
  Std. error          |  1054.114
Advanced Degree       |          
  Coefficient         |  53206.38
  Std. error          |  1270.507
Intercept             |          
  Coefficient         |  25628.12
  Std. error          |  790.4475
---------------------------------

That’s got all the numbers we want and in the right places, but the appearance has a long way to go.

3.2 Removing Base Levels (Reference Categories)

You can get rid of the rows for base levels with:

collect style showbase off
collect preview



-------------------------------
Female              |          
  Coefficient       | -17536.46
  Std. error        |  605.1548
High School Diploma |          
  Coefficient       |  7976.849
  Std. error        |  940.9349
Some College        |          
  Coefficient       |  14402.38
  Std. error        |  924.3338
Bachelor's          |          
  Coefficient       |  32385.91
  Std. error        |  1054.114
Advanced Degree     |          
  Coefficient       |  53206.38
  Std. error        |  1270.507
Intercept           |          
  Coefficient       |  25628.12
  Std. error        |  790.4475
-------------------------------

This is an example of a style that applies to the entire table.

3.3 Controlling Row and Column Headers

Other styles can be applied to specific dimensions or even individual tags. For example, collect style header controls how row and column headers are displayed, and the option level(hide) means to hide the labels for the individual levels. That sounds like a great way to get rid of the “Coefficient” and “Std. error” labels we don’t want, but we don’t want to get rid of the predictor labels (“Female”, “Advanced Degree”, etc.) too.

A collect styles command takes a target to act on just like most other Stata commands take a list of variables to act on. The target can be a dimension, a tag, or even an individual cell (specified with interactions). In this case, the target is the result dimension:

collect style header result, level(hide)
collect preview



-------------------------------
Female              | -17536.46
                    |  605.1548
High School Diploma |  7976.849
                    |  940.9349
Some College        |  14402.38
                    |  924.3338
Bachelor's          |  32385.91
                    |  1054.114
Advanced Degree     |  53206.38
                    |  1270.507
Intercept           |  25628.12
                    |  790.4475
-------------------------------

Better, but now it’s hard to distinguish the coefficients from the standard errors.

3.4 Formatting Results

You can fix that by putting parentheses around the standard errors. This is done with the collect style cell command, which allows you to control the appearance of individual cells. The sformat() option lets you put text before or after the content of the cell. %s represents the original content, so sformat("(%s)") means “take the content of the cell (%s) and put parentheses around it.”

You only want this to apply to the standard errors, so specify the target result[_r_se]:

collect style cell result[_r_se], sformat("(%s)")
collect preview



--------------------------------
Female              |  -17536.46
                    | (605.1548)
High School Diploma |   7976.849
                    | (940.9349)
Some College        |   14402.38
                    | (924.3338)
Bachelor's          |   32385.91
                    | (1054.114)
Advanced Degree     |   53206.38
                    | (1270.507)
Intercept           |   25628.12
                    | (790.4475)
--------------------------------

The nformat() option lets you set numeric formats. You can use the format %8.1f to display just one digit to the right of the decimal point. (%8.1fc would add commas to the numbers.) This time, we want to apply the change to the entire table:

collect style cell, nformat(%8.1f)
collect preview



------------------------------
Female              | -17536.5
                    |  (605.2)
High School Diploma |   7976.8
                    |  (940.9)
Some College        |  14402.4
                    |  (924.3)
Bachelor's          |  32385.9
                    | (1054.1)
Advanced Degree     |  53206.4
                    | (1270.5)
Intercept           |  25628.1
                    |  (790.4)
------------------------------

3.5 Controlling Header Labels

Now let’s consider the row labels. While we can deduce that most of them represent levels of an education variable, that should be explicit.

By default the colname titles are not shown for factor variables. You can turn it on for edu and set the title to the variable label with:

collect style header edu, title(label)
collect preview



-----------------------------------------------------
Female                                     | -17536.5
                                           |  (605.2)
Educational Attainment=High School Diploma |   7976.8
                                           |  (940.9)
Educational Attainment=Some College        |  14402.4
                                           |  (924.3)
Educational Attainment=Bachelor's          |  32385.9
                                           | (1054.1)
Educational Attainment=Advanced Degree     |  53206.4
                                           | (1270.5)
Intercept                                  |  25628.1
                                           |  (790.4)
-----------------------------------------------------

This is clear but ugly. We can apply the stack style to the rows, which will put the title at the top, but only if we get rid of the equals sign “binder” between the title and the level label:

collect style row stack, nobinder
collect preview



---------------------------------
Female                 | -17536.5
                       |  (605.2)
Educational Attainment |         
  High School Diploma  |   7976.8
                       |  (940.9)
  Some College         |  14402.4
                       |  (924.3)
  Bachelor's           |  32385.9
                       | (1054.1)
  Advanced Degree      |  53206.4
                       | (1270.5)
Intercept              |  25628.1
                       |  (790.4)
---------------------------------

3.6 Changing Labels

Some of these labels are kind of long. You could fix that by changing the variable and value labels in the data set before you create the table, but it’s also easy to change labels in a collection.

Start by looking at the labels for the edu dimension:

collect label list edu

  Collection: default
   Dimension: edu
       Label: Educational Attainment
Level labels:
           1  Less Than High School
           2  High School Diploma
           3  Some College
           4  Bachelor's
           5  Advanced Degree

New labels for levels of a dimension are specified very much like value labels. Start with collect label levels, followed by the name of the dimension whose labels you want to change. Then you give the level name followed by the new label in quotes. End with the modify option since you’re changing existing labels.

Change the label for level 2 of edu to just “High School” with:

collect label levels edu 2 "High School", modify

We can save a lot more space by changing “Educational Attainment” to just “Education.” The edu dimension has the label “Educational Attainment” and we could change it with collect label dim edu "Education". (Note how this is much like changing a variable label.) But it wouldn’t do us any good, because the “Educational Attainment” that’s actually being used is a level label of colname:

collect label list colname

  Collection: default
   Dimension: colname
       Label: Covariate names and column names
Level labels:
       _cons  Intercept
         edu  Educational Attainment
      female  Female

This level label can be changed just like changing the level labels for the edu dimension, but with the interesting twist that the level name is edu rather than a number:

collect label levels colname edu "Education", modify
collect preview



----------------------------
Female            | -17536.5
                  |  (605.2)
Education         |         
  High School     |   7976.8
                  |  (940.9)
  Some College    |  14402.4
                  |  (924.3)
  Bachelor's      |  32385.9
                  | (1054.1)
  Advanced Degree |  53206.4
                  | (1270.5)
Intercept         |  25628.1
                  |  (790.4)
----------------------------

3.7 Stars for Significance

Now let’s apply stars to the coefficients that are statistically significant. This is done with the collect stars command. You then give the result the stars are based on, usually _r_p, followed by a list of threshold numbers and the text that should be applied if _r_p is below that value. The attach() option specifies which result should get the stars, normally _r_b for a regression.

Suppose you want to put three stars after coefficients with a p-value below .001, two stars after those with a p-value below .01, and one star after those with a p-value below .05. However, to make all the numbers line up, actually put three stars, two stars followed by a space, and one star followed by two spaces respectively. Coefficients that don’t get any stars need three spaces to line up, so you can define a threshold of 1 for them.

collect stars _r_p .001 "*** " .01 "** " .05 "* " 1 "   ", attach(_r_b)
collect preview



--------------------------------
Female            | -17536.5*** 
                  |      (605.2)
Education         |             
  High School     |   7976.8*** 
                  |      (940.9)
  Some College    |  14402.4*** 
                  |      (924.3)
  Bachelor's      |  32385.9*** 
                  |     (1054.1)
  Advanced Degree |  53206.4*** 
                  |     (1270.5)
Intercept         |  25628.1*** 
                  |      (790.4)
--------------------------------

If you want the standard errors to align with the coefficient numbers rather than the stars, add three spaces to them. Remember that we used collect style cell and the sformat() option to add parentheses to the standard errors, and you can use it again to add parentheses and spaces:

collect style cell _r_se, sformat("(%s)   ")
collect preview



--------------------------------
Female            | -17536.5*** 
                  |   (605.2)   
Education         |             
  High School     |   7976.8*** 
                  |   (940.9)   
  Some College    |  14402.4*** 
                  |   (924.3)   
  Bachelor's      |  32385.9*** 
                  |  (1054.1)   
  Advanced Degree |  53206.4*** 
                  |  (1270.5)   
Intercept         |  25628.1*** 
                  |   (790.4)   
--------------------------------

3.8 Table Titles

This table could use a title, which you can set with collect title. Unless you’re feeling creative, call it “Model 1”:

collect title Model 1
collect preview



Model 1
--------------------------------
Female            | -17536.5*** 
                  |   (605.2)   
Education         |             
  High School     |   7976.8*** 
                  |   (940.9)   
  Some College    |  14402.4*** 
                  |   (924.3)   
  Bachelor's      |  32385.9*** 
                  |  (1054.1)   
  Advanced Degree |  53206.4*** 
                  |  (1270.5)   
Intercept         |  25628.1*** 
                  |   (790.4)   
--------------------------------

3.9 Text Formatting

You can control the appearance of the title with collect style title. The font() option lets you choose a font and then apply further options to it. To just make the title bold without changing the font, use font(, bold). The comma tells Stata that you’re going straight to options without setting a font.

You won’t see the bold font in the web book, and it’s pretty subtle in Stata’s results window. But you’ll see it clearly when you export the table.

collect style title, font(, bold)
collect preview



Model 1
--------------------------------
Female            | -17536.5*** 
                  |   (605.2)   
Education         |             
  High School     |   7976.8*** 
                  |   (940.9)   
  Some College    |  14402.4*** 
                  |   (924.3)   
  Bachelor's      |  32385.9*** 
                  |  (1054.1)   
  Advanced Degree |  53206.4*** 
                  |  (1270.5)   
Intercept         |  25628.1*** 
                  |   (790.4)   
--------------------------------

Stata’s results window uses a monospace font (all characters have the same width). This keeps things lined up. To make all the cells in your table use a monospace font when they’re exported, like courier, use the font() option with collect style cell (and no target so it applies to everything).

collect style cell, font(courier)

3.10 Borders

Finally, let’s get rid of the separator line between the row labels and the numbers. This is done with the border() option of collect style cell. The separator is a border on the right of the label cells, and you can get rid of it by setting the right border pattern to nill:

collect style cell, border(right, pattern(nil))
collect preview



Model 1
------------------------------
Female            -17536.5*** 
                    (605.2)   
Education                     
  High School       7976.8*** 
                    (940.9)   
  Some College     14402.4*** 
                    (924.3)   
  Bachelor's       32385.9*** 
                   (1054.1)   
  Advanced Degree  53206.4*** 
                   (1270.5)   
Intercept          25628.1*** 
                    (790.4)   
------------------------------

Now it’s time to export the results to the format you want to use it in. Options include Word, Excel, PDF, HTML, LaTeX, Markdown, and plain text. Do so with the collect export command, followed by the file name. Stata will deduce the file type you want from the file extension (.docx for Word, .xlsx for Excel, etc.). Don’t forget to add the replace option so you can run your do file more than once.

collect export model1.docx, replace
collect export model1.xlsx, replace
collect export model1.html, replace

(collection default exported to file model1.docx)

(collection default exported to file model1.xlsx)

(collection default exported to file model1.html)

You’ll probably need to tweak the results a bit: the Word table could be narrower, for example, while the Excel columns need to be wider. But you won’t have to do anything (like retype numbers) that would introduce errors.

Exercise

The commands below create a table of t-test results (you can also think of it as a table of summary statistics that happens to include whether the differences between the groups are significant). Make it pretty–but don’t try to add stars for significance. (Solution)

You may want to run collect label list result to see what results ttest generates.

collect diff=(r(mu_1) - r(mu_2)) means that in addition to the usual results from running a t-test, the collect command should calculate the difference in the means between the two populations and store it in the result dimension as diff. You can use similar code to calculate any values you need.

What makes this table a little complicated to lay out (and why I did it for you) is that we want some of the levels of the result dimension to be rows and some to be columns. The solution is to create new row and col dimensions and put the appropriate levels of result in each.

collect clear
collect diff=(r(mu_1) - r(mu_2)), tag(var[Age]): ttest age, by(hispanic)
collect diff=(r(mu_1) - r(mu_2)), tag(var[Income]): ttest income, by(hispanic)
collect addtag col[not_hispanic], fortags(result[mu_1 sd_1])
collect addtag col[hispanic], fortags(result[mu_2 sd_2])
collect addtag col[Difference], fortags(result[diff])
collect addtag col[pval], fortags(result[p])
collect addtag row[mean], fortags(result[mu_1 mu_2 diff p])
collect addtag row[sd], fortags(result[sd_1 sd_2])
collect layout (var#row) (col)



Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
Not Hisp |  13,401    40.73838    .1085502    12.56607     40.5256    40.95115
Hispanic |   1,757    35.68754    .2817152    11.80854      35.135    36.24007
---------+--------------------------------------------------------------------
Combined |  15,158    40.15292    .1022155    12.58455    39.95257    40.35328
---------+--------------------------------------------------------------------
    diff |            5.050842    .3166677                4.430135    5.671549
------------------------------------------------------------------------------
    diff = mean(Not Hisp) - mean(Hispanic)                        t =  15.9500
H0: diff = 0                                     Degrees of freedom =    15156

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000


Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
Not Hisp |  13,401    35102.85    364.9115    42243.14    34387.57    35818.13
Hispanic |   1,757    22463.13    654.2966    27425.88    21179.85    23746.41
---------+--------------------------------------------------------------------
Combined |  15,158    33637.75    333.0283    41001.73    32984.97    34290.52
---------+--------------------------------------------------------------------
    diff |            12639.72     1035.28                10610.44    14668.99
------------------------------------------------------------------------------
    diff = mean(Not Hisp) - mean(Hispanic)                        t =  12.2090
H0: diff = 0                                     Degrees of freedom =    15156

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000

(4 items changed in collection default)

(4 items changed in collection default)

(2 items changed in collection default)

(2 items changed in collection default)

(8 items changed in collection default)

(4 items changed in collection default)


Collection: default
      Rows: var#row
   Columns: col
   Table 1: 6 x 4

--------------------------------------------------
       | not_hispanic hispanic Difference     pval
-------+------------------------------------------
Age    |                                          
  mean |     40.73838 35.68754   5.050842 8.26e-57
  sd   |     12.56607 11.80854                    
Income |                                          
  mean |     35102.85 22463.13   12639.72 4.03e-34
  sd   |     42243.14 27425.88                    
--------------------------------------------------