Do Groups
From time to time you will want to do several calculations based on the same condition. Repeatedly checking the same condition is both computationally inefficient and makes your code harder to read. The DO
statement is the tool you are looking for. The DO and END statements mark the beginning and end of a block of statements that should all be executed as a unit.
For example, suppose you are working with the cars
data set, and want to calculate the effect of imposing a 100% tariff on foreign cars. (“We’re going to put a 100 percent tariff on every single car that comes across the line” - Donald Trump, 17 March 2024.) You might want to create three new variables: tariff, MSRP_tariff, Invoice_tariff.
data cars;
set sashelp.cars;
msrp2024 = msrp *1.7; /* inflation, 2004 to 2024 */
invoice2024 = invoice*1.7;
if (origin ne "USA") then do;
tariff = msrp2024;
MSRP_tariff = msrp2024 + tariff;
invoice_tariff = invoice2024 + tariff;
end;
else do;
tariff = 0;
MSRP_tariff = msrp2024 + tariff;
invoice_tariff = invoice2024 + tariff;
end;
run;
proc means data=cars n mean stddev;
var msrp2024 msrp_tariff invoice2024 invoice_tariff;
run;
The MEANS Procedure
Variable N Mean Std Dev
-----------------------------------------------------
msrp2024 428 55717.25 33033.92
MSRP_tariff 428 94865.53 70575.84
invoice2024 428 51024.99 29991.60
invoice_tariff 428 90173.27 67791.10
-----------------------------------------------------
[Not the best example: you can easily rewrite this without the DO!]
Do Loops (Iteration)
At other times you will find you want to execute some statement(s) repeatedly. Do loops show up regularly for repeating statements across several variables (see arrays), but might also be used for checking values in a list, or generating random data.
An iterative DO can take several forms:
DO value-list;
DO var = start TO stop <BY step> <WHILE | UNTIL>;
DO WHILE (condition);
DO UNTIL (condition);
Do value-list
We can iterate over an arbitrary set of values. This would be the only way to iterate over a set of character values!
For example, if we wanted to simulate scores for three teams:
data teams;
length team $ 6;
do team = "red", "yellow", "green";
score = rand('integer', 90, 100);
output;
end;
run;
proc print noobs; run;
team score
red 94
yellow 92
green 99
DO integer sequence
The most common form of the DO loop will be to iterate over a sequence of integers, counting iterations.
For example, if we wanted to simulate a two-group t-test, with 15 observations per group:
data simulated;
do i = 1 to 30;
if (i le 15) then group = 1;
else group = 2;
y = rand('normal', group+3);
output;
end;
drop i; /* it is common to drop the counter */
run;
ods select statistics ttests;
proc ttest data=simulated;
class group;
var y;
run;
The TTEST Procedure
Variable: y
group Method N Mean Std Dev Std Err
1 15 4.3296 1.2600 0.3253
2 15 5.0437 0.6097 0.1574
Diff (1-2) Pooled -0.7141 0.9898 0.3614
Diff (1-2) Satterthwaite -0.7141 0.3614
group Method Minimum Maximum
1 2.2711 6.4245
2 3.7822 5.8285
Diff (1-2) Pooled
Diff (1-2) Satterthwaite
Method Variances DF t Value Pr > |t|
Pooled Equal 28 -1.98 0.0581
Satterthwaite Unequal 20.215 -1.98 0.0620
DO BY steps
A DO loop need not proceed by steps of 1, and in fact could count backward. In this example, note that we do not end exactly at 1, but stop when that value is exceeded.
data countdown;
do x = 10 to 1 by -2;
output;
end;
run;
proc print noobs; run;
Variable iteration parameters
While the preceding examples give fixed start, stop, and step values, these may in fact be given by variables.
Suppose we are given summary data to work with.
proc summary data=sashelp.class;
class sex;
types sex;
var height;
output out=class_summary mean=mhght stddev=sdhght;
run;
proc print data=class_summary noobs;
run;
Sex _TYPE_ _FREQ_ mhght sdhght
F 1 9 60.5889 5.01833
M 1 10 63.9100 4.93794
We could use the means and standard deviations to simulate data (and begin the work of developing a power analysis [although this particular analysis could be done with the appropriate formula]).
data sim_class;
set class_summary;
do i = 1 to _freq_;
sim_height = rand('normal', mhght, sdhght);
output;
end;
drop i;
run;
ods select statistics ttests;
proc ttest data=sim_class;
class sex;
var sim_height;
run;
The TTEST Procedure
Variable: sim_height
Sex Method N Mean Std Dev Std Err
F 9 64.0680 6.1354 2.0451
M 10 61.4186 6.1271 1.9376
Diff (1-2) Pooled 2.6493 6.1310 2.8170
Diff (1-2) Satterthwaite 2.6493 2.8172
Sex Method Minimum Maximum
F 57.1867 77.1330
M 49.6289 74.1450
Diff (1-2) Pooled
Diff (1-2) Satterthwaite
Method Variances DF t Value Pr > |t|
Pooled Equal 17 0.94 0.3601
Satterthwaite Unequal 16.786 0.94 0.3603