library(SASmarkdown)
SAS found at C:/Program Files/SASHome/SASFoundation/9.4/sas.exe
SAS engines are now ready to use.
SAS found at C:/Program Files/SASHome/SASFoundation/9.4/sas.exe
SAS engines are now ready to use.
Often we want to execute a statement or a group of statements for just selected observations in our data set. In SAS this is accomplished with an IF-THEN
statement.
The basic syntax for IF-THEN is just
IF condition THEN statement;
where condition is an expression that can be interpreted as a logical value and statement is any executable SAS statement.
The statement
is executed only if the condition
is true.
Two examples where this is frequently used are searching for unusual values in your data and recoding.
One use of IF-THEN is to identify data values of interest and
For example, we might want to know which observations in the cars
data are missing information about their number of cylinders.
2 data _null_;
3 set sashelp.cars;
4 if (cylinders eq .) then put make model cylinders;
5 run;
Mazda RX-8 4dr automatic .
Mazda RX-8 4dr manual .
data cyl_missing;
set sashelp.cars;
if (cylinders eq .) then output;
run;
proc print data=cyl_missing noobs;
var make model cylinders;
run;
Make Model Cylinders
Mazda RX-8 4dr automatic .
Mazda RX-8 4dr manual .
This has an even simpler form, known as the “subsetting IF”. With a subsetting IF, all statements that follow the IF are executed only for the observations for which the condition is true. And only those observations are sent to the output data set. (But note that using WHERE is more efficient.)
data cyl_missing;
set sashelp.cars;
if cylinders eq .;
run;
proc print data=cyl_missing noobs;
var make model cylinders;
run;
Make Model Cylinders
Mazda RX-8 4dr automatic .
Mazda RX-8 4dr manual .
IF-THEN recodes are often better handles as IF-THEN/ELSE. This is both more computationaly efficient, but also less prone to logic errors.
One common approach to coding indicator variables is to initialize a variable with one value (typically 0), and use IF-THEN to indicate the other value.
A problem with this approach is that people often forget to account for missing data.
data cars;
set sashelp.cars;
eightcyl = 0;
if (cylinders eq 8) then eightcyl = 1;
run;
proc freq data=cars;
tables cylinders*eightcyl / nocol nopercent missing;
run;
The FREQ Procedure
Table of Cylinders by eightcyl
Cylinders eightcyl
Frequency|
Row Pct | 0| 1| Total
---------+--------+--------+
. | 2 | 0 | 2
| 100.00 | 0.00 |
---------+--------+--------+
3 | 1 | 0 | 1
| 100.00 | 0.00 |
---------+--------+--------+
4 | 136 | 0 | 136
| 100.00 | 0.00 |
---------+--------+--------+
5 | 7 | 0 | 7
| 100.00 | 0.00 |
---------+--------+--------+
6 | 190 | 0 | 190
| 100.00 | 0.00 |
---------+--------+--------+
8 | 0 | 87 | 87
| 0.00 | 100.00 |
---------+--------+--------+
10 | 2 | 0 | 2
| 100.00 | 0.00 |
---------+--------+--------+
12 | 3 | 0 | 3
| 100.00 | 0.00 |
---------+--------+--------+
Total 341 87 428
Another approach here might be to explicitly code both values of the indicator.
data cars;
set sashelp.cars;
if (cylinders ne .) then eightcyl = 0;
if (cylinders eq 8) then eightcyl = 1;
run;
proc freq data=cars;
tables cylinders*eightcyl / nocol nopercent missing;
run;
The FREQ Procedure
Table of Cylinders by eightcyl
Cylinders eightcyl
Frequency|
Row Pct | .| 0| 1| Total
---------+--------+--------+--------+
. | 2 | 0 | 0 | 2
| 100.00 | 0.00 | 0.00 |
---------+--------+--------+--------+
3 | 0 | 1 | 0 | 1
| 0.00 | 100.00 | 0.00 |
---------+--------+--------+--------+
4 | 0 | 136 | 0 | 136
| 0.00 | 100.00 | 0.00 |
---------+--------+--------+--------+
5 | 0 | 7 | 0 | 7
| 0.00 | 100.00 | 0.00 |
---------+--------+--------+--------+
6 | 0 | 190 | 0 | 190
| 0.00 | 100.00 | 0.00 |
---------+--------+--------+--------+
8 | 0 | 0 | 87 | 87
| 0.00 | 0.00 | 100.00 |
---------+--------+--------+--------+
10 | 0 | 2 | 0 | 2
| 0.00 | 100.00 | 0.00 |
---------+--------+--------+--------+
12 | 0 | 3 | 0 | 3
| 0.00 | 100.00 | 0.00 |
---------+--------+--------+--------+
Total 2 339 87 428
This second approach, while it gets us the desired result, would be more efficient if we add an ELSE statement.
An ELSE statement tells SAS what to do if the condition is not true, and must be the next statement after an IF-THEN. Indentation is not required, but helps us humans see the statements as a group.
The syntax is:
IF condition THEN statement1;
ELSE statement2;
If the condition
is true, then statement1
will execute. If it is not, statement2
will execute. Note that statement2
can also be an IF-THEN, which allows you to deal with many possibilities. Revising our recode example from above
data cars;
set sashelp.cars;
if (cylinders eq 8) then eightcyl = 1;
else if (cylinders ne .) then eightcyl = 0;
run;
proc freq data=cars;
tables cylinders*eightcyl / nocol nopercent missing;
run;
The FREQ Procedure
Table of Cylinders by eightcyl
Cylinders eightcyl
Frequency|
Row Pct | .| 0| 1| Total
---------+--------+--------+--------+
. | 2 | 0 | 0 | 2
| 100.00 | 0.00 | 0.00 |
---------+--------+--------+--------+
3 | 0 | 1 | 0 | 1
| 0.00 | 100.00 | 0.00 |
---------+--------+--------+--------+
4 | 0 | 136 | 0 | 136
| 0.00 | 100.00 | 0.00 |
---------+--------+--------+--------+
5 | 0 | 7 | 0 | 7
| 0.00 | 100.00 | 0.00 |
---------+--------+--------+--------+
6 | 0 | 190 | 0 | 190
| 0.00 | 100.00 | 0.00 |
---------+--------+--------+--------+
8 | 0 | 0 | 87 | 87
| 0.00 | 0.00 | 100.00 |
---------+--------+--------+--------+
10 | 0 | 2 | 0 | 2
| 0.00 | 100.00 | 0.00 |
---------+--------+--------+--------+
12 | 0 | 3 | 0 | 3
| 0.00 | 100.00 | 0.00 |
---------+--------+--------+--------+
Total 2 339 87 428
In another example, suppose we wanted to recode vehicle weight
(a continous variable) into thousand pound categories.
One approach might be to use a series of IF-THEN statements. Notice that this example assumes no missing data. For each observation, five conditions are checked.
data cars;
set sashelp.cars;
if (weight lt 3000) then wgt = 2;
if (weight ge 3000 and weight lt 4000) then wgt = 3;
if (weight ge 4000 and weight lt 5000) then wgt = 4;
if (weight ge 5000 and weight lt 6000) then wgt = 5;
if (weight ge 6000) then wgt = 6;
run;
A more efficient approach is
data cars;
set sashelp.cars;
if weight lt 3000 then wgt = 2;
else if weight lt 4000 then wgt = 3;
else if weight lt 5000 then wgt = 4;
else if weight lt 6000 then wgt = 5;
else wgt = 6;
run;
proc freq data=cars;
tables wgt / nocum;
run;
The FREQ Procedure
wgt Frequency Percent
----------------------------
2 87 20.33
3 238 55.61
4 81 18.93
5 19 4.44
6 3 0.70
Here, only one condition is checked for 87 observations, and the majority of observations are handles with one or two checks. And the code is much more readable.
Like IF-THEN/ELSE, SELECT-WHEN stops processing subsequent statements once it encounters a true condition.
data cars;
set sashelp.cars;
select;
when (weight lt 3000) wgt = 2;
when (weight lt 4000) wgt = 3;
when (weight lt 5000) wgt = 4;
when (weight lt 6000) wgt = 5;
otherwise wgt = 6;
end;
run;
proc freq data=cars;
tables wgt / nocum;
run;
The FREQ Procedure
wgt Frequency Percent
----------------------------
2 87 20.33
3 238 55.61
4 81 18.93
5 19 4.44
6 3 0.70