2 Table 1 with Hierarchical Data

This is another Table 1, but the complication this time is that it’s for hierarchical data. The data set consists of children who can have multiple placements into foster homes, and we want summary statistics both by child and by placement.

2.1 Setting up

Load the data with:

clear all
use https://sscc.wisc.edu/sscc/pubs/real_world_tables/table1b

Take a moment to look it over and identify the variable names.

2.2 `tag` and `by`

The egen function tag() takes an identifier variable as its argument, and then creates a new variable which is 1 for the first observation for that identifier, and zero for all the others. (Note that this has nothing to do with the tags used by collect.)

This data set has a variable id that identifies all the placements for a given child, so create a tag for each child with:

egen child = tag(id)

The classic use of a tag like this is in an if condition. For example, compare:

tab sex


        Sex |      Freq.     Percent        Cum.
------------+-----------------------------------
       Male |      1,717       58.88       58.88
     Female |      1,199       41.12      100.00
------------+-----------------------------------
      Total |      2,916      100.00

tab sex if child


        Sex |      Freq.     Percent        Cum.
------------+-----------------------------------
       Male |        516       51.60       51.60
     Female |        484       48.40      100.00
------------+-----------------------------------
      Total |      1,000      100.00

The first table gives the frequencies of sex for all the placements. The second gives the frequencies of sex for all the children. The percentages are different because in this (made up) data set boys tend to have more placements.

Now create a table of descriptive statistics, adding the by(child) option:

dtable i.race i.sex, by(child)


-----------------------------------------------------
                             tag(id)                 
                 0             1            Total    
-----------------------------------------------------
N          1,916 (65.7%) 1,000 (34.3%) 2,916 (100.0%)
Race                                                 
  White      675 (35.2%)   340 (34.0%)  1,015 (34.8%)
  Black      612 (31.9%)   329 (32.9%)    941 (32.3%)
  Hispanic   629 (32.8%)   331 (33.1%)    960 (32.9%)
Sex                                                  
  Male     1,201 (62.7%)   516 (51.6%)  1,717 (58.9%)
  Female     715 (37.3%)   484 (48.4%)  1,199 (41.1%)
-----------------------------------------------------

This gives three sets of descriptive statistics: one for the observations where child is zero, one for the observations where child is one, and one (Total) for all observations.

The group where child is zero contains all the observations other than the first observation for each child. You’re not interested in the descriptive statistics for this group.

The group where child is one contains one observation per child and thus gives child-level descriptive statistics. (Note how it matches the second set of frequencies calculated earlier.) You want these descriptive statistics.

The descriptive statistics for all observations gives placement-level descriptive statistics. You want these descriptive statistics too.

Take a look at the layout of this collection:

collect layout


Collection: DTable
      Rows: var
   Columns: child#result
   Table 1: 8 x 3

-----------------------------------------------------
                             tag(id)                 
                 0             1            Total    
-----------------------------------------------------
N          1,916 (65.7%) 1,000 (34.3%) 2,916 (100.0%)
Race                                                 
  White      675 (35.2%)   340 (34.0%)  1,015 (34.8%)
  Black      612 (31.9%)   329 (32.9%)    941 (32.3%)
  Hispanic   629 (32.8%)   331 (33.1%)    960 (32.9%)
Sex                                                  
  Male     1,201 (62.7%)   516 (51.6%)  1,717 (58.9%)
  Female     715 (37.3%)   484 (48.4%)  1,199 (41.1%)
-----------------------------------------------------

You have one column per level of child per level of result. Get a list of the levels of child with:

collect levelsof child


Collection: DTable
 Dimension: child
    Levels: 0 1 .m

.m is the Total column. You’re interested in that and the 1 column, so specify that only those levels should be included. Also specify that the number of observations (_N) should come at the end rather than the beginning by listing the levels of var in the order you want:

collect layout (var[i.race i.sex _N]) (child[1 .m]#result)


Collection: DTable
      Rows: var[i.race i.sex _N]
   Columns: child[1 .m]#result
   Table 1: 8 x 2

---------------------------------------
                      tag(id)          
                 1            Total    
---------------------------------------
Race                                   
  White      340 (34.0%)  1,015 (34.8%)
  Black      329 (32.9%)    941 (32.3%)
  Hispanic   331 (33.1%)    960 (32.9%)
Sex                                    
  Male       516 (51.6%)  1,717 (58.9%)
  Female     484 (48.4%)  1,199 (41.1%)
N          1,000 (34.3%) 2,916 (100.0%)
---------------------------------------

Next change the labels for the child dimension:

collect label levels child 1 "Child" .m "Placement", modify

The label tag(id) is the variable label egen created for the child variable. dtable used it as the title for the child header. You don’t really need anything there, so hide that title:

collect style header child, title(hide)

Finally, change the label for _N from N to Sample Size. (This is purely a matter of taste.)

collect label levels var _N "Sample Size", modify
collect preview


----------------------------------------
                Child        Placement  
----------------------------------------
Race                                    
  White       340 (34.0%)  1,015 (34.8%)
  Black       329 (32.9%)    941 (32.3%)
  Hispanic    331 (33.1%)    960 (32.9%)
Sex                                     
  Male        516 (51.6%)  1,717 (58.9%)
  Female      484 (48.4%)  1,199 (41.1%)
Sample Size 1,000 (34.3%) 2,916 (100.0%)
----------------------------------------

2.3 `collect unget`

Better, but the percentages on Sample Size don’t make much sense.

As of Stata 19 you can remove items from the result dimension with collect unget. Just specify the result to be removed and for which tags:

collect unget percent, fortags(var[_N])
collect preview

(3 items removed from collection DTable)

-------------------------------------
               Child      Placement  
-------------------------------------
Race                                 
  White     340 (34.0%) 1,015 (34.8%)
  Black     329 (32.9%)   941 (32.3%)
  Hispanic  331 (33.1%)   960 (32.9%)
Sex                                  
  Male      516 (51.6%) 1,717 (58.9%)
  Female    484 (48.4%) 1,199 (41.1%)
Sample Size       1,000         2,916
-------------------------------------

2.1 Setting up

2.2 tag and by

2.3 collect unget

2.2 `tag` and `by`

2.3 `collect unget`