SAS PROC FREQ

Doug Hemken

March 2017

Introduction

PROC FREQ provides a core command for a number of tasks that revolve around frequency tables and crosstabulations (contingency tables). You will use PROC FREQ for

  • one-way tables
  • two-way tables
  • chi-square tests of association
  • categorical measures of association, and other measures based on contingency tables
  • bar charts
  • creating data sets of counts or percents

Documentation is found in the Base SAS 9.4 Procedures Guide: Statistical Procedures.

Basic Use - One-Way Tables

In its simplest use, PROC FREQ gives you counts and percents.

proc freq data=sashelp.heart;
    tables smoking_status;
run;
                            The FREQ Procedure

                              Smoking Status
 
                                                Cumulative    Cumulative
  Smoking_Status       Frequency     Percent     Frequency      Percent
  ----------------------------------------------------------------------
  Heavy (16-25)            1046       20.22          1046        20.22  
  Light (1-5)               579       11.19          1625        31.41  
  Moderate (6-15)           576       11.13          2201        42.55  
  Non-smoker               2501       48.35          4702        90.90  
  Very Heavy (> 25)         471        9.10          5173       100.00  

                          Frequency Missing = 36

Minimal Specification

This is the barest PROC FREQ. Here, the default data set is _LAST_, and the default variables to analyze are _ALL_. (This example produces a lot of output! So I haven't show the output here.)

data heart;
  set sashelp.heart;
  run;
proc freq;
run;

Table Requests

You can request one-way tables, two-way (or n-way) tables. There is a table request "algebra" for n-way tables.

You can request multiple tables per TABLES statement, and you can use multiple TABLES statements per PROC.

One-way

proc freq;
  tables smoking_status;
    run;
                            The FREQ Procedure

                              Smoking Status
 
                                                Cumulative    Cumulative
  Smoking_Status       Frequency     Percent     Frequency      Percent
  ----------------------------------------------------------------------
  Heavy (16-25)            1046       20.22          1046        20.22  
  Light (1-5)               579       11.19          1625        31.41  
  Moderate (6-15)           576       11.13          2201        42.55  
  Non-smoker               2501       48.35          4702        90.90  
  Very Heavy (> 25)         471        9.10          5173       100.00  

                          Frequency Missing = 36

Two-way

Specified as row*column.

proc freq;
    tables smoking_status*sex;
    run;
                            The FREQ Procedure

                      Table of Smoking_Status by Sex

                Smoking_Status(Smoking Status)     Sex

                Frequency        |
                Percent          |
                Row Pct          |
                Col Pct          |Female  |Male    |  Total
                -----------------+--------+--------+
                Heavy (16-25)    |    339 |    707 |   1046
                                 |   6.55 |  13.67 |  20.22
                                 |  32.41 |  67.59 |
                                 |  11.87 |  30.51 |
                -----------------+--------+--------+
                Light (1-5)      |    422 |    157 |    579
                                 |   8.16 |   3.03 |  11.19
                                 |  72.88 |  27.12 |
                                 |  14.78 |   6.78 |
                -----------------+--------+--------+
                Moderate (6-15)  |    340 |    236 |    576
                                 |   6.57 |   4.56 |  11.13
                                 |  59.03 |  40.97 |
                                 |  11.90 |  10.19 |
                -----------------+--------+--------+
                Non-smoker       |   1682 |    819 |   2501
                                 |  32.51 |  15.83 |  48.35
                                 |  67.25 |  32.75 |
                                 |  58.89 |  35.35 |
                -----------------+--------+--------+
                Very Heavy (> 25 |     73 |    398 |    471
                )                |   1.41 |   7.69 |   9.10
                                 |  15.50 |  84.50 |
                                 |   2.56 |  17.18 |
                -----------------+--------+--------+
                Total                2856     2317     5173
                                    55.21    44.79   100.00

                          Frequency Missing = 36

The table request "algebra" consists of variable lists, the asterisk for "crossing" variables, and parentheses to distribute the crossing operation.

So a specification like A*(B C) is equal to A*B A*C. In a data set with variables A through F, in order, the specification (A--C)*(D--F) would mean A*D A*E A*F B*D B*E B*F C*D C*E C*F.

Multiple requests per tables statement

As implied by the request algebra, a single TABLES statements may have more than one table request, and even more than one type of table requested.

proc freq;
    tables smoking_status sex*smoking_status;
    run;
                            The FREQ Procedure

                              Smoking Status
 
                                                Cumulative    Cumulative
  Smoking_Status       Frequency     Percent     Frequency      Percent
  ----------------------------------------------------------------------
  Heavy (16-25)            1046       20.22          1046        20.22  
  Light (1-5)               579       11.19          1625        31.41  
  Moderate (6-15)           576       11.13          2201        42.55  
  Non-smoker               2501       48.35          4702        90.90  
  Very Heavy (> 25)         471        9.10          5173       100.00  

                          Frequency Missing = 36

                      Table of Sex by Smoking_Status

      Sex       Smoking_Status(Smoking Status)

      Frequency|
      Percent  |
      Row Pct  |
      Col Pct  |Heavy (1|Light (1|Moderate|Non-smok|Very Hea|  Total
               |6-25)   |-5)     | (6-15) |er      |vy (> 25|
               |        |        |        |        |)       |
      ---------+--------+--------+--------+--------+--------+
      Female   |    339 |    422 |    340 |   1682 |     73 |   2856
               |   6.55 |   8.16 |   6.57 |  32.51 |   1.41 |  55.21
               |  11.87 |  14.78 |  11.90 |  58.89 |   2.56 |
               |  32.41 |  72.88 |  59.03 |  67.25 |  15.50 |
      ---------+--------+--------+--------+--------+--------+
      Male     |    707 |    157 |    236 |    819 |    398 |   2317
               |  13.67 |   3.03 |   4.56 |  15.83 |   7.69 |  44.79
               |  30.51 |   6.78 |  10.19 |  35.35 |  17.18 |
               |  67.59 |  27.12 |  40.97 |  32.75 |  84.50 |
      ---------+--------+--------+--------+--------+--------+
      Total        1046      579      576     2501      471     5173
                  20.22    11.19    11.13    48.35     9.10   100.00

                          Frequency Missing = 36

Multiple tables statements

It is also possible to have multiple TABLES statements. This is especially useful when specifying different table options for distinct groups of variables.

/* suppress some default output, and request additional statistics */
proc freq;
    tables smoking_status / nocum;
    tables smoking_status*sex / nopercent nocol chisq;
    run;
                            The FREQ Procedure

                              Smoking Status
 
                Smoking_Status       Frequency     Percent
                ------------------------------------------
                Heavy (16-25)            1046       20.22 
                Light (1-5)               579       11.19 
                Moderate (6-15)           576       11.13 
                Non-smoker               2501       48.35 
                Very Heavy (> 25)         471        9.10 

                          Frequency Missing = 36

                      Table of Smoking_Status by Sex

                Smoking_Status(Smoking Status)     Sex

                Frequency        |
                Row Pct          |Female  |Male    |  Total
                -----------------+--------+--------+
                Heavy (16-25)    |    339 |    707 |   1046
                                 |  32.41 |  67.59 |
                -----------------+--------+--------+
                Light (1-5)      |    422 |    157 |    579
                                 |  72.88 |  27.12 |
                -----------------+--------+--------+
                Moderate (6-15)  |    340 |    236 |    576
                                 |  59.03 |  40.97 |
                -----------------+--------+--------+
                Non-smoker       |   1682 |    819 |   2501
                                 |  67.25 |  32.75 |
                -----------------+--------+--------+
                Very Heavy (> 25 |     73 |    398 |    471
                )                |  15.50 |  84.50 |
                -----------------+--------+--------+
                Total                2856     2317     5173

                          Frequency Missing = 36

               Statistics for Table of Smoking_Status by Sex

          Statistic                     DF       Value      Prob
          ------------------------------------------------------
          Chi-Square                     4    743.4890    <.0001
          Likelihood Ratio Chi-Square    4    771.5109    <.0001
          Mantel-Haenszel Chi-Square     1     40.7641    <.0001
          Phi Coefficient                       0.3791          
          Contingency Coefficient               0.3545          
          Cramer's V                            0.3791          

                       Effective Sample Size = 5173
                          Frequency Missing = 36

Using Summary Data

/* the data in summary or "table" form */
data coffee2;
  input loc $ type $ count;
datalines;
drive-up cappuccino 2
window cappuccino 4
drive-up espresso 6
window espresso 2
drive-up iced 2
window iced 2
drive-up kona 2
window kona 9
;

proc print; run; /* look at data values in output */
                   Obs      loc       type        count

                    1     drive-up    cappucci      2  
                    2     window      cappucci      4  
                    3     drive-up    espresso      6  
                    4     window      espresso      2  
                    5     drive-up    iced          2  
                    6     window      iced          2  
                    7     drive-up    kona          2  
                    8     window      kona          9  
proc freq data=coffee2;       /* weighted data */
    tables type*loc / nopercent norow chisq;
    weight count;  /* many procs have a "freq" statement */
    run;
                              The SAS System                              1
                                             15:01 Wednesday, March 8, 2017

                   Obs      loc       type        count

                    1     drive-up    cappucci      2  
                    2     window      cappucci      4  
                    3     drive-up    espresso      6  
                    4     window      espresso      2  
                    5     drive-up    iced          2  
                    6     window      iced          2  
                    7     drive-up    kona          2  
                    8     window      kona          9  
 
                                                                           
 
                            The FREQ Procedure

                           Table of type by loc

                    type      loc

                    Frequency|
                    Col Pct  |drive-up|window  |  Total
                    ---------+--------+--------+
                    cappucci |      2 |      4 |      6
                             |  16.67 |  23.53 |
                    ---------+--------+--------+
                    espresso |      6 |      2 |      8
                             |  50.00 |  11.76 |
                    ---------+--------+--------+
                    iced     |      2 |      2 |      4
                             |  16.67 |  11.76 |
                    ---------+--------+--------+
                    kona     |      2 |      9 |     11
                             |  16.67 |  52.94 |
                    ---------+--------+--------+
                    Total          12       17       29

                    Statistics for Table of type by loc

          Statistic                     DF       Value      Prob
          ------------------------------------------------------
          Chi-Square                     3      6.4509    0.0916
          Likelihood Ratio Chi-Square    3      6.7244    0.0812
          Mantel-Haenszel Chi-Square     1      1.8100    0.1785
          Phi Coefficient                       0.4716          
          Contingency Coefficient               0.4266          
          Cramer's V                            0.4716          

           WARNING: 88% of the cells have expected counts less 
                    than 5. Chi-Square may not be a valid test.

                             Sample Size = 29

You also use PROC FREQ to create summary data sets, with the OUTPUT= option.

proc freq data=coffee2 noprint;   /* save a crosstab as summary data */
    tables loc * type / out=coffeetable;
    /* name a data set for the output*/
    weight count;
    run;

proc print data=coffeetable; run;
                              The SAS System                              1
                                             15:01 Wednesday, March 8, 2017

                   Obs      loc       type        count

                    1     drive-up    cappucci      2  
                    2     window      cappucci      4  
                    3     drive-up    espresso      6  
                    4     window      espresso      2  
                    5     drive-up    iced          2  
                    6     window      iced          2  
                    7     drive-up    kona          2  
                    8     window      kona          9  
 
                                                                           
 
              Obs      loc       type        COUNT    PERCENT

               1     drive-up    cappucci      2       6.8966
               2     drive-up    espresso      6      20.6897
               3     drive-up    iced          2       6.8966
               4     drive-up    kona          2       6.8966
               5     window      cappucci      4      13.7931
               6     window      espresso      2       6.8966
               7     window      iced          2       6.8966
               8     window      kona          9      31.0345