SAS PROC FREQ
Doug Hemken
March 2017
Introduction
PROC FREQ
provides a core command for a number of tasks that revolve around frequency tables and crosstabulations (contingency tables). You will use PROC FREQ for
- one-way tables
- two-way tables
- chi-square tests of association
- categorical measures of association, and other measures based on contingency tables
- bar charts
- creating data sets of counts or percents
Documentation is found in the Base SAS 9.4 Procedures Guide: Statistical Procedures.
Basic Use - One-Way Tables
In its simplest use, PROC FREQ gives you counts and percents.
proc freq data=sashelp.heart;
tables smoking_status;
run;
The FREQ Procedure
Smoking Status
Cumulative Cumulative
Smoking_Status Frequency Percent Frequency Percent
----------------------------------------------------------------------
Heavy (16-25) 1046 20.22 1046 20.22
Light (1-5) 579 11.19 1625 31.41
Moderate (6-15) 576 11.13 2201 42.55
Non-smoker 2501 48.35 4702 90.90
Very Heavy (> 25) 471 9.10 5173 100.00
Frequency Missing = 36
Minimal Specification
This is the barest PROC FREQ
. Here, the default data set is _LAST_
, and the default variables to analyze are _ALL_
. (This example produces a lot of output! So I haven't show the output here.)
data heart;
set sashelp.heart;
run;
proc freq;
run;
Table Requests
You can request one-way tables, two-way (or n-way) tables. There is a table request "algebra" for n-way tables.
You can request multiple tables per TABLES
statement, and you can use multiple TABLES
statements per PROC.
One-way
proc freq;
tables smoking_status;
run;
The FREQ Procedure
Smoking Status
Cumulative Cumulative
Smoking_Status Frequency Percent Frequency Percent
----------------------------------------------------------------------
Heavy (16-25) 1046 20.22 1046 20.22
Light (1-5) 579 11.19 1625 31.41
Moderate (6-15) 576 11.13 2201 42.55
Non-smoker 2501 48.35 4702 90.90
Very Heavy (> 25) 471 9.10 5173 100.00
Frequency Missing = 36
Two-way
Specified as row*column.
proc freq;
tables smoking_status*sex;
run;
The FREQ Procedure
Table of Smoking_Status by Sex
Smoking_Status(Smoking Status) Sex
Frequency |
Percent |
Row Pct |
Col Pct |Female |Male | Total
-----------------+--------+--------+
Heavy (16-25) | 339 | 707 | 1046
| 6.55 | 13.67 | 20.22
| 32.41 | 67.59 |
| 11.87 | 30.51 |
-----------------+--------+--------+
Light (1-5) | 422 | 157 | 579
| 8.16 | 3.03 | 11.19
| 72.88 | 27.12 |
| 14.78 | 6.78 |
-----------------+--------+--------+
Moderate (6-15) | 340 | 236 | 576
| 6.57 | 4.56 | 11.13
| 59.03 | 40.97 |
| 11.90 | 10.19 |
-----------------+--------+--------+
Non-smoker | 1682 | 819 | 2501
| 32.51 | 15.83 | 48.35
| 67.25 | 32.75 |
| 58.89 | 35.35 |
-----------------+--------+--------+
Very Heavy (> 25 | 73 | 398 | 471
) | 1.41 | 7.69 | 9.10
| 15.50 | 84.50 |
| 2.56 | 17.18 |
-----------------+--------+--------+
Total 2856 2317 5173
55.21 44.79 100.00
Frequency Missing = 36
The table request "algebra" consists of variable lists, the asterisk for "crossing" variables, and parentheses to distribute the crossing operation.
So a specification like A*(B C)
is equal to A*B A*C
. In a data set with variables A through F, in order, the specification (A--C)*(D--F)
would mean A*D A*E A*F B*D B*E B*F C*D C*E C*F
.
Multiple requests per tables statement
As implied by the request algebra, a single TABLES
statements may have more than one table request, and even more than one type of table requested.
proc freq;
tables smoking_status sex*smoking_status;
run;
The FREQ Procedure
Smoking Status
Cumulative Cumulative
Smoking_Status Frequency Percent Frequency Percent
----------------------------------------------------------------------
Heavy (16-25) 1046 20.22 1046 20.22
Light (1-5) 579 11.19 1625 31.41
Moderate (6-15) 576 11.13 2201 42.55
Non-smoker 2501 48.35 4702 90.90
Very Heavy (> 25) 471 9.10 5173 100.00
Frequency Missing = 36
Table of Sex by Smoking_Status
Sex Smoking_Status(Smoking Status)
Frequency|
Percent |
Row Pct |
Col Pct |Heavy (1|Light (1|Moderate|Non-smok|Very Hea| Total
|6-25) |-5) | (6-15) |er |vy (> 25|
| | | | |) |
---------+--------+--------+--------+--------+--------+
Female | 339 | 422 | 340 | 1682 | 73 | 2856
| 6.55 | 8.16 | 6.57 | 32.51 | 1.41 | 55.21
| 11.87 | 14.78 | 11.90 | 58.89 | 2.56 |
| 32.41 | 72.88 | 59.03 | 67.25 | 15.50 |
---------+--------+--------+--------+--------+--------+
Male | 707 | 157 | 236 | 819 | 398 | 2317
| 13.67 | 3.03 | 4.56 | 15.83 | 7.69 | 44.79
| 30.51 | 6.78 | 10.19 | 35.35 | 17.18 |
| 67.59 | 27.12 | 40.97 | 32.75 | 84.50 |
---------+--------+--------+--------+--------+--------+
Total 1046 579 576 2501 471 5173
20.22 11.19 11.13 48.35 9.10 100.00
Frequency Missing = 36
Multiple tables statements
It is also possible to have multiple TABLES
statements. This is especially useful when specifying different table options for distinct groups of variables.
/* suppress some default output, and request additional statistics */
proc freq;
tables smoking_status / nocum;
tables smoking_status*sex / nopercent nocol chisq;
run;
The FREQ Procedure
Smoking Status
Smoking_Status Frequency Percent
------------------------------------------
Heavy (16-25) 1046 20.22
Light (1-5) 579 11.19
Moderate (6-15) 576 11.13
Non-smoker 2501 48.35
Very Heavy (> 25) 471 9.10
Frequency Missing = 36
Table of Smoking_Status by Sex
Smoking_Status(Smoking Status) Sex
Frequency |
Row Pct |Female |Male | Total
-----------------+--------+--------+
Heavy (16-25) | 339 | 707 | 1046
| 32.41 | 67.59 |
-----------------+--------+--------+
Light (1-5) | 422 | 157 | 579
| 72.88 | 27.12 |
-----------------+--------+--------+
Moderate (6-15) | 340 | 236 | 576
| 59.03 | 40.97 |
-----------------+--------+--------+
Non-smoker | 1682 | 819 | 2501
| 67.25 | 32.75 |
-----------------+--------+--------+
Very Heavy (> 25 | 73 | 398 | 471
) | 15.50 | 84.50 |
-----------------+--------+--------+
Total 2856 2317 5173
Frequency Missing = 36
Statistics for Table of Smoking_Status by Sex
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 4 743.4890 <.0001
Likelihood Ratio Chi-Square 4 771.5109 <.0001
Mantel-Haenszel Chi-Square 1 40.7641 <.0001
Phi Coefficient 0.3791
Contingency Coefficient 0.3545
Cramer's V 0.3791
Effective Sample Size = 5173
Frequency Missing = 36
Using Summary Data
/* the data in summary or "table" form */
data coffee2;
input loc $ type $ count;
datalines;
drive-up cappuccino 2
window cappuccino 4
drive-up espresso 6
window espresso 2
drive-up iced 2
window iced 2
drive-up kona 2
window kona 9
;
proc print; run; /* look at data values in output */
Obs loc type count
1 drive-up cappucci 2
2 window cappucci 4
3 drive-up espresso 6
4 window espresso 2
5 drive-up iced 2
6 window iced 2
7 drive-up kona 2
8 window kona 9
proc freq data=coffee2; /* weighted data */
tables type*loc / nopercent norow chisq;
weight count; /* many procs have a "freq" statement */
run;
The SAS System 1
15:01 Wednesday, March 8, 2017
Obs loc type count
1 drive-up cappucci 2
2 window cappucci 4
3 drive-up espresso 6
4 window espresso 2
5 drive-up iced 2
6 window iced 2
7 drive-up kona 2
8 window kona 9
The FREQ Procedure
Table of type by loc
type loc
Frequency|
Col Pct |drive-up|window | Total
---------+--------+--------+
cappucci | 2 | 4 | 6
| 16.67 | 23.53 |
---------+--------+--------+
espresso | 6 | 2 | 8
| 50.00 | 11.76 |
---------+--------+--------+
iced | 2 | 2 | 4
| 16.67 | 11.76 |
---------+--------+--------+
kona | 2 | 9 | 11
| 16.67 | 52.94 |
---------+--------+--------+
Total 12 17 29
Statistics for Table of type by loc
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 3 6.4509 0.0916
Likelihood Ratio Chi-Square 3 6.7244 0.0812
Mantel-Haenszel Chi-Square 1 1.8100 0.1785
Phi Coefficient 0.4716
Contingency Coefficient 0.4266
Cramer's V 0.4716
WARNING: 88% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Sample Size = 29
You also use PROC FREQ
to create summary data sets, with the OUTPUT=
option.
proc freq data=coffee2 noprint; /* save a crosstab as summary data */
tables loc * type / out=coffeetable;
/* name a data set for the output*/
weight count;
run;
proc print data=coffeetable; run;
The SAS System 1
15:01 Wednesday, March 8, 2017
Obs loc type count
1 drive-up cappucci 2
2 window cappucci 4
3 drive-up espresso 6
4 window espresso 2
5 drive-up iced 2
6 window iced 2
7 drive-up kona 2
8 window kona 9
Obs loc type COUNT PERCENT
1 drive-up cappucci 2 6.8966
2 drive-up espresso 6 20.6897
3 drive-up iced 2 6.8966
4 drive-up kona 2 6.8966
5 window cappucci 4 13.7931
6 window espresso 2 6.8966
7 window iced 2 6.8966
8 window kona 9 31.0345