Do Groups and Loops

Do Groups

From time to time you will want to do several calculations based on the same condition. Repeatedly checking the same condition is both computationally inefficient and makes your code harder to read. The DO statement is the tool you are looking for. The DO and END statements mark the beginning and end of a block of statements that should all be executed as a unit.

For example, suppose you are working with the cars data set, and want to calculate the effect of imposing a 100% tariff on foreign cars. (“We’re going to put a 100 percent tariff on every single car that comes across the line” - Donald Trump, 17 March 2024.) You might want to create three new variables: tariff, MSRP_tariff, Invoice_tariff.

data cars;
  set sashelp.cars;
  msrp2024    = msrp   *1.7; /* inflation, 2004 to 2024 */
  invoice2024 = invoice*1.7;

  if (origin ne "USA") then do;
    tariff         = msrp2024;
    MSRP_tariff    = msrp2024    + tariff;
    invoice_tariff = invoice2024 + tariff;
    end;

    else do;
      tariff         = 0;
      MSRP_tariff    = msrp2024    + tariff;
      invoice_tariff = invoice2024 + tariff;
      end;
run;

proc means data=cars n mean stddev;
  var msrp2024 msrp_tariff invoice2024 invoice_tariff;
run;
                            The MEANS Procedure

           Variable            N            Mean         Std Dev
           -----------------------------------------------------
           msrp2024          428        55717.25        33033.92
           MSRP_tariff       428        94865.53        70575.84
           invoice2024       428        51024.99        29991.60
           invoice_tariff    428        90173.27        67791.10
           -----------------------------------------------------

[Not the best example: you can easily rewrite this without the DO!]

Do Loops (Iteration)

At other times you will find you want to execute some statement(s) repeatedly. Do loops show up regularly for repeating statements across several variables (see arrays), but might also be used for checking values in a list, or generating random data.

An iterative DO can take several forms:

DO value-list;
DO var = start TO stop <BY step> <WHILE | UNTIL>;
DO WHILE (condition); 
DO UNTIL (condition);

Do value-list

We can iterate over an arbitrary set of values. This would be the only way to iterate over a set of character values!

For example, if we wanted to simulate scores for three teams:

data teams;
  length team $ 6;
  do team = "red", "yellow", "green";
    score = rand('integer', 90, 100);
    output;
  end;
run;

proc print noobs; run;
                              team      score

                              red         94 
                              yellow      92 
                              green       99 

DO integer sequence

The most common form of the DO loop will be to iterate over a sequence of integers, counting iterations.

For example, if we wanted to simulate a two-group t-test, with 15 observations per group:

data simulated;
  do i = 1 to 30;
    if (i le 15) then group = 1;
      else group = 2;
      y = rand('normal', group+3);
      output;
  end;
  drop i; /* it is common to drop the counter */
run;

ods select statistics ttests;
proc ttest data=simulated;
  class group;
  var y;
run;
                            The TTEST Procedure
 
                               Variable:  y

  group         Method             N        Mean     Std Dev     Std Err

  1                               15      4.3296      1.2600      0.3253
  2                               15      5.0437      0.6097      0.1574
  Diff (1-2)    Pooled                   -0.7141      0.9898      0.3614
  Diff (1-2)    Satterthwaite            -0.7141                  0.3614

            group         Method            Minimum     Maximum

            1                                2.2711      6.4245
            2                                3.7822      5.8285
            Diff (1-2)    Pooled                               
            Diff (1-2)    Satterthwaite                        

        Method           Variances        DF    t Value    Pr > |t|

        Pooled           Equal            28      -1.98      0.0581
        Satterthwaite    Unequal      20.215      -1.98      0.0620

DO BY steps

A DO loop need not proceed by steps of 1, and in fact could count backward. In this example, note that we do not end exactly at 1, but stop when that value is exceeded.

data countdown;
  do x = 10 to 1 by -2;
    output;
  end;
run;

proc print noobs; run;
                                     x

                                    10
                                     8
                                     6
                                     4
                                     2

Variable iteration parameters

While the preceding examples give fixed start, stop, and step values, these may in fact be given by variables.

Suppose we are given summary data to work with.

proc summary data=sashelp.class;
    class sex;
    types sex;
    var height;
    output out=class_summary mean=mhght stddev=sdhght;
    run;
proc print data=class_summary noobs;
run;
               Sex    _TYPE_    _FREQ_     mhght      sdhght

                F        1         9      60.5889    5.01833
                M        1        10      63.9100    4.93794

We could use the means and standard deviations to simulate data (and begin the work of developing a power analysis [although this particular analysis could be done with the appropriate formula]).

data sim_class;
  set class_summary;
  do i = 1 to _freq_;
    sim_height = rand('normal', mhght, sdhght);
    output;
    end;
    drop i;
  run;

ods select statistics ttests;
proc ttest data=sim_class;
  class sex;
  var sim_height;
  run;
                            The TTEST Procedure
 
                           Variable:  sim_height

  Sex           Method             N        Mean     Std Dev     Std Err

  F                                9     64.0680      6.1354      2.0451
  M                               10     61.4186      6.1271      1.9376
  Diff (1-2)    Pooled                    2.6493      6.1310      2.8170
  Diff (1-2)    Satterthwaite             2.6493                  2.8172

            Sex           Method            Minimum     Maximum

            F                               57.1867     77.1330
            M                               49.6289     74.1450
            Diff (1-2)    Pooled                               
            Diff (1-2)    Satterthwaite                        

        Method           Variances        DF    t Value    Pr > |t|

        Pooled           Equal            17       0.94      0.3601
        Satterthwaite    Unequal      16.786       0.94      0.3603