The INPUT statement in a SAS DATA step builds observations variable-by-variable. By default, SAS keeps reading records from the input file until it has filled in the values for the current observation. When a record is too short to fulfill all the values in an observation, SAS inputs a new record (and puts a NOTE in the log). However, this is not always what we want. There are other options.
Multiple Records per Observation
By default SAS will “flow over” to the next record, when a record is too short.
data scores;
infile datalines;
input team $ name $ score1 score2 score3;
datalines;
red
Joe 11 32 76
blue
Mitchell 13 29 82
green
Susan 14 27 74
;
proc print; run;
Obs team name score1 score2 score3
1 red Joe 11 32 76
2 blue Mitchell 13 29 82
3 green Susan 14 27 74
The log includes the NOTE
NOTE: SAS went to a new line when INPUT statement reached past the end of
a line.
It is not particularly important to SAS that every observation have the same number of records!
data scores;
infile datalines;
input team $ name $ score1 score2 score3;
datalines;
red
Joe
11 32 76
blue Mitchell 13 29 82
green
Susan 14 27 74
;
proc print; run;
Obs team name score1 score2 score3
1 red Joe 11 32 76
2 blue Mitchell 13 29 82
3 green Susan 14 27 74
It is important to SAS that every observation have the same number of data values! Without placeholders for the missing values the data values become misaligned. Luckily, text data are seldom this poorly organized, but it is instructive to see what goes wrong.
data scores;
infile datalines;
input team $ name $ score1 score2 score3;
datalines;
red
Joe
blue Mitchell 13 29 82
green
Susan 14 27 74
;
proc print; run;
Obs team name score1 score2 score3
1 red Joe . . 13
2 green Susan 14 27 74
The log tells us
NOTE: Invalid data for score1 in line 8 1-4.
NOTE: Invalid data for score2 in line 8 6-12.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----
8 blue Mitchell 13 29 82
team=red name=Joe score1=. score2=. score3=13 _ERROR_=1 _N_=1
NOTE: SAS went to a new line when INPUT statement reached past the end of
a line.
Because the first two lines were two short to fill an observation, three records were read. When SAS tries to create score1
from the text in the third data value encountered, “blue”, it finds this isn’t numeric, sets the value of score1
to missing, and sets the _ERROR_ flag to 1. The same thing happens as SAS tries to use the text “Mitchell” to fill in the value of score2
. The next text, “13”, is valid as score3
. SAS writes NOTEs in the log, and writes the offending _INFILE_ to the log (along with a RULE to aid in counting characters). Finally, because the _ERROR_ flag is 1, SAS writes the PDV to the log as well.
Notice as well that SAS did not try to use the text “29 82” for anything. When the first observation was “complete”, it was output. Then SAS returned to the top of the DATA step and began by INPUTing a new record. By default, each iteration of the DATA step begins with a newly read record.
And notice that SAS did produce a data set! Habitually checking the log is important!
Fixing FLOWOVER
There are several alternatives to FLOWOVER, including MISSOVER and TRUNCOVER. These options are specified on the INFILE statement.
In the case where each record represents an observation and missing data values are all at the end of each record, the MISSOVER option is useful. With the MISSOVER option, when the end of a record is reached before all the INPUT variables have been encountered, SAS sets the remaining data values to missing. SAS will not automatically go to a new line.
data scores;
infile datalines missover;
input team $ name $ score1 score2 score3;
datalines;
red Joe
blue Mitchell 13 29 82
green Susan 14 27 74
;
proc print; run;
Obs team name score1 score2 score3
1 red Joe . . .
2 blue Mitchell 13 29 82
3 green Susan 14 27 74
This problem is avoided if your text data has placeholders for missing data!
Multiple Observations per Record
Where the text data are arranged so that there are multiple observations per record (usually when there are only a very few variables), we don’t want SAS to read a new record with every iteration of the DATA step. To change this behavior, we use @@
at the end of the INPUT statement.
data scores;
infile datalines;
input name $ score1 score2 score3 team $ @@;
datalines;
Joe 11 32 76 red Mitchell 13 29 82 blue
Susan 14 27 74 green
;
proc print; run;
Obs name score1 score2 score3 team
1 Joe 11 32 76 red
2 Mitchell 13 29 82 blue
3 Susan 14 27 74 green
SAS is still building observations data value by data value. In FLOWOVER mode a short observation at the end of a line would cause the INPUT to go to a new line.