Data Set Options
Whenever you specify a SAS data set, you can also use data set options to make a variety of modifications to that data set. Most of these options can be used with either input or output data sets. Some of them also have analogous SAS statements that may be used in most DATA steps and PROCS.
There are many data set options. Those used most commonly are for
- renaming variables
- keeping selected observations
- keeping selected variables
- recording the source of an observation (when merging)
Some of the most used options are discussed in Subsetting SAS data sets.
Here we will look at the RENAME
option to illustrate the differences between input, output, and statement specifications.
In general, data set options take the form
data-set-name ( option1 option2 ...)
where options are specified in parentheses after the data set name. (Options may include their own parentheses, as you will see in the first example.) Options are separated by spaces.
Output Options
The data set created by a DATA step is the output data set. Some PROCS also produce output data sets.
In a PROC step
PROC SUMMARY (similar to PROC MEANS) produces an output data set with summary statistics.
proc summary data=sashelp.class;
var height;
output out=descriptives;
run;
proc print data=descriptives;
run;
Obs _TYPE_ _FREQ_ _STAT_ Height
1 0 19 N 19.0000
2 0 19 MIN 51.3000
3 0 19 MAX 72.0000
4 0 19 MEAN 62.3368
5 0 19 STD 5.1271
To rename the variable _FREQ_, we could specify
proc summary data=sashelp.class;
var height;
output out=descriptives(rename=(_FREQ_=NOBS));
run;
proc print data=descriptives;
run;
Obs _TYPE_ NOBS _STAT_ Height
1 0 19 N 19.0000
2 0 19 MIN 51.3000
3 0 19 MAX 72.0000
4 0 19 MEAN 62.3368
5 0 19 STD 5.1271
Notice that we could rename height
in a few different places. By adding a RENAME option to the input data for PROC SUMMARY we would end up renaming height in both the input and output data sets. This would also require changing the varname on the VAR statement. If we were to rename height on the output data set, it would be changed only in that data set.
In a DATA step
DATA steps also have output data sets (say that three times, fast!).
Notice that if we rename a variable in the output data set, statements in the DATA step must still use the original name.
data class(rename=(height=inches));
set sashelp.class;
bmi = (weight/height**2)*703;
run;
proc means data=class;
var bmi inches;
run;
The MEANS Procedure
Variable N Mean Std Dev Minimum Maximum
-------------------------------------------------------------------------
bmi 19 17.8632519 2.0926193 13.4900007 21.4296601
inches 19 62.3368421 5.1270752 51.3000000 72.0000000
-------------------------------------------------------------------------
Statements instead of Options
Many data set options have analogous DATA step statements. These have the same effect as an output data set option.
data class;
set sashelp.class;
rename age=years height=inches;
bmi = (weight/height**2)*703;
run;
proc means data=class;
var years inches bmi;
run;
The MEANS Procedure
Variable N Mean Std Dev Minimum Maximum
-------------------------------------------------------------------------
years 19 13.3157895 1.4926722 11.0000000 16.0000000
inches 19 62.3368421 5.1270752 51.3000000 72.0000000
bmi 19 17.8632519 2.0926193 13.4900007 21.4296601
-------------------------------------------------------------------------
The exception to statements-as-output-options is the WHERE statement. See Subsetting SAS data sets for more details of that data set option/statement.