Working with Numeric Data

Assigning Values

The most common thing you will do in a DATA step is to assign values to a variable. The basic assignment statement is

variable = expression;

where variable names either a new or existing variable, and expression is some combination of constants, variable names, operators, and functions. When an expression is evaluated (executed) it results in a data value.

This is the one SAS statement that does not begin with a SAS keyword. Assignment always occurs within a DATA step. The SAS language includes all the usual arithmetic operators and numeric functions.

For example, using the SAS example data set, class, we can add a variable for the body mass index of each kid, calculated from the existing variables height and weight. In this example, we read the data from the SASHELP library, and create a new copy in the WORK library (in WORK because we have not named any other library on the DATA statement), and call the new variable bmi.

data class;
  set sashelp.class;
  bmi = (weight/height**2)*703;
run;

proc means data=class n mean stddev;
  var weight height bmi;
run;
                            The MEANS Procedure

              Variable     N            Mean         Std Dev
              ----------------------------------------------
              Weight      19     100.0263158      22.7739335
              Height      19      62.3368421       5.1270752
              bmi         19      17.8632519       2.0926193
              ----------------------------------------------

A DATA step can include any number of assignment statements, and they are executed in order, one observation at a time. You should think of a DATA step as an implicit loop: SAS reads in one observation according to a statement that is (usually) at the top of the step, runs through each line of the step with that one observation, and outputs the observation to the output data set at the bottom of the step. Then SAS returns to the top of the step, and repeats for as long as it finds a new observation to read. See Understanding SAS DATA Steps for a more detailed explanation.

Overwriting Existing Variables (Recoding)

For example, another approach to our bmi calculation might begin by converting Imperial heights and weights into SI units. In this example, notice that we are transforming weight and height (they appear on both the left and right of the assignment operator, the equals sign). And the order matters. It would be a mistake to put the bmi statement first. However, changing the statement order would also give us output with no error messages in the log!

data class;
  set sashelp.class;
  weight = weight/2.2;
  height = height/39.37;
  bmi    = weight/height**2;
run;

proc means data=class n mean stddev;
  var weight height bmi;
run;
                            The MEANS Procedure

              Variable     N            Mean         Std Dev
              ----------------------------------------------
              Weight      19      45.4665072      10.3517880
              Height      19       1.5833590       0.1302280
              bmi         19      17.9024862       2.0972154
              ----------------------------------------------

Compare with this, where the bmi values are nonsense!

2          data class;
3            set sashelp.class;
4            bmi    = weight/height**2;
5            weight = weight/2.2;
6            height = height/39.37;
7          run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS has 19 observations and 6 variables.

8          
9          proc means data=class n mean stddev;
10           var weight height bmi;
11         run;

NOTE: There were 19 observations read from the data set WORK.CLASS.
NOTE: The PROCEDURE MEANS printed page 1.
                            The MEANS Procedure

              Variable     N            Mean         Std Dev
              ----------------------------------------------
              Weight      19      45.4665072      10.3517880
              Height      19       1.5833590       0.1302280
              bmi         19       0.0254100       0.0029767
              ----------------------------------------------

Missing Values

When using most SAS operators and functions, a missing value in an expression results in a missing value (with a few important exceptions). In

z = x + y;

if either x or y is missing, z will be missing.

data missing;
  input x y;
  z = x + y;
datalines;
59 1
60 .
. -39
;

proc print noobs; run;
                               x      y     z

                              59      1    60
                              60      .     .
                               .    -39     .

Observation-wise Summary Statistics

The functions that calculate summary statistics within an observation are exceptions to missing value propagation. Functions like MEAN(), STD(), STDERR(), and SUM() usually have several variables as arguments. As long as their values are not all missing (within an observation), the result is also not missing.

These functions can generally take one of two forms, using either variable names separated by commas, or OF and a variable list.

mean(var1, var2, var3 ...)
mean(of varlist)
data missing;
  input x1 x2 x3 x4;
  a = mean(x1, x2, x3, x4);
  b = std (x1, x2, x3, x4);
  c = stderr(of x1-x4);
  d = sum   (of _numeric_);
datalines;
59 1 7 2
60 . -3 5
. -39 . 0
;

proc print noobs; run;
    x1     x2    x3    x4        a          b          c           d

    59      1     7     2     17.2500    27.9568    13.9784     128.185
    60      .    -3     5     20.6667    34.2977    19.8018     136.766
     .    -39     .     0    -19.5000    27.5772    19.5000     -11.423

Sum Operator

As described in Understanding DATA steps, the sum operator, which accumulates sums across observations, does not propagate missing values.