Ch 7: Understanding DATA Step Processing Flashcards
Which of the following is not written to the output during the compilation phase?
a. the data set descriptor
b. the first observation
c. the program data vector
d. the _N_ and _ERROR_ automatic variables
Correct answer: b During the compilation phase, the program data vector is created. The program data vector includes the two automatic variables _N_ and _ERROR_. The descriptor portion of the new SAS data set is created at the end of the compilation phase. The descriptor portion includes the name of the data set, the number of observations and variables, and the names and attributes of the variables. Observations are not written until the execution phase
During the compilation phase, SAS scans each statement in the DATA step, looking for syntax errors. Which of the following is not considered a syntax error?
a. incorrect values and formats
b. invalid options or variable names
c. missing or invalid punctuation
d. missing or misspelled keywords
Correct answer: a Syntax checking can detect many common errors, but it cannot verify the values of variables or the correctness of formats.
Unless otherwise directed, how does the DATA step execute?
a. once for each compilation phase
b. once for each DATA step statement
c. once for each record in the input file
d. once for each variable in the input file
Correct answer: c The DATA step executes once for each record in the input file, unless otherwise directed.
At the beginning of the execution phase, the value of _N_ is 1, the value of _ERROR_ is 0, and the values of the remaining variables are set to the following:
a. 0
b. 1
c. undefined
d. missing
Correct answer: d The remaining variables are initialized to missing. Missing numeric values are represented by periods, and missing character values are represented by blanks.
Suppose you run a program that causes three DATA step errors. What is the value of the automatic variable _ERROR_ when the observation that contains the third error is processed?
a. 0
b. 1
c. 2
d. 3
Correct answer: b The default value of _ERROR_ is 0, which means there is no data error. When an error occurs, whether one error or multiple errors, the value is set to 1.
Which of the following actions occurs at the beginning of an iteration of the DATA step?
a. The automatic variables _N_ and _ERROR_ are incremental by one.
b. The DATA step stops execution.
c. The descriptor portion of the data set is written.
d. The values of variables created in programming statements are reset to missing in the program data vector.
Correct answer: d By default, at the end of the DATA step, the values in the program data vector are written to the data set as an observation. Then, control returns to the top of the DATA step, the value of the automatic variable _N_ is incremented by one, and the values of variables that were created in programming statements are reset to missing. The automatic variable _ERROR_ is reset to 0 if necessary.
Consider the following DATA step. Based on the sample input file below, in what order are the variables stored in the new SAS data set? data work.fin2; set cert.finance; if Salary>25000 then Raise=0.03; else Raise=0.05; NewSalary=(Salary*Raise)+Salary; run;
a. SSN Name Salary Date Raise NewSalary
b. Raise NewSalary SSN Name Salary Date
c. NewSalary Raise SSN Name Salary Date
d. SSN Name Date Salary Raise NewSalary
Correct answer: a The order in which variables are defined in the DATA step determines the order in which the variables are stored in the data set.
What happens when SAS cannot interpret syntax errors?
a. Data set variables contain missing values.
b. The DATA step does not compile.
c. The DATA step still compiles, but it does not execute.
d. The DATA step still compiles and executes.
Correct answer: c When SAS cannot detect syntax errors, the DATA step compiles, but it does not execute.
What is wrong with this program?
data work.fin2;
set cert.finance;
length Raise $9;
if Salary>25000 then Raise=’3 Percent’;
else Raise=’5 Percent’; if Salary>25000 then NewSalary= 25000*0.03)+Salary;
else NewSalary=(Salary*0.05)+Salary;
length Bonus $5;
Bonus=Raise*0.02;
run;
a. There is a missing semicolon on the second line.
b. There is a missing semicolon on the third line.
c. The variables Bonus and Raise have the incorrect length.
d. The variable type for Bonus is incorrect.
Correct answer: d The variable type for Bonus is incorrect. When there is an incorrect variable type, SAS attempts to automatically convert to the correct variable type. If it cannot, SAS continues processing and produces output with missing values.
Which procedure produces distinct values of variables and can be used to clean your data?
a. PROC CONTENTS
b. PROC MEANS
c. PROC FREQ
d. PROC PRINT
Correct answer: c The FREQ procedure detects invalid character and numeric values by looking at distinct values. You can use PROC FREQ to identify any variables that were not given an expected value.
At the start of DATA step processing, during the compilation phase, variables are created in the program data vector (PDV), and observations are set to which of the following:
a. blank.
b. missing.
c. 0.
d. there are no observations.
Correct answer: d At the bottom of the DATA step, the compilation phase is complete, and the descriptor portion of the new SAS data set is created. There are no observations because the DATA step has not yet executed.