Module 5: Combining SAS Data Sets Flashcards
What are the two phases of data step processing?
1) Compilation phase
2) Execution phase
What happens during the compilaion phase?
1)Each statements within the data step are scanned for syntax errors
2)SAS identifies the type and length of each variable and creates a descriptor portion
3)Sets up Program Data Vector (PDV) with one row for 1 observation
What does the Program Data Vector (PDV) contain?
- all variables in the source data
- all variables created in the data step statements
- automatic variables (helps count when sas moves on to next variable)
What happends during the execution phase?
1) creates variables and sets all values to missing
2) Fills with source varaible from the 1st observation
3) Continues until all variables are filled
4) After its executed, moves on to the next observation
What happens when you stack two(+) data sets?
Stacks data on top of one another. For same variables, but different observations
Write the syntax to create permanent variables after merging or stacking data.
data name;
set data1 (in=a) data2 (in=b);
indataa = a;
indatab=b;
run;
T/F: An error message appears when trying to merge/stack variables with different lengths.
True: It warns you of possible truncation in the log
When merging or stacking variables, what properties of variables do you have to keep in mind?
1) length of variables
2) variable type
What step do you have to do before a merge?
Sort your data by a key (variable they both have in common)
T/F: Sort is automatically set to descending.
False: Set to ascending, need to specify descending for variables
What happens when you merge data sets that have more than one variables in common other than the BY variables?
SAS keeps the variable form the LAST input datafile
Given the following code, order which code gets executed first.
data check2;
set one;
keep x z;
rename x =firstname;
run;
1) runs keep statement first
2) renames x variable
T/F: The rename statement is always “executed” after the drop/keep statement.
True
T/F: Given the following code, x and y are outputs for three1 and…
data three1(keep=x y);
set one;
run;
no other variables are stored in the PDV memory
False: Other variables from data one are stored in PDV memory for this data step
T/F: Given the following code, the renamed variables are stored in…
data five3;
set two (rename=(a=group1 b=group2));
run;
the PDV.
True: It is renamed as it is read into the PDV.