Ch 10: Combining SAS Data Sets Flashcards
Which program combines Work.One and Work.Two to produce Work.Three?
a.
data work.three;
set work.one;
set work.two;
run;
b.
data work.three;
set work.one work.two;
run;
c.
data work.three;
set work.one work.two;
by varx;
run;
d.
data work.three;
merge work.one work.two;
by varx;
run;
Correct answer: a
This example is a case of one-to-one matching, which requires multiple SET statements. Where same-named variables occur, values that are read from the second data set replace those that are read from the first data set. Also, the number of observations in the new data set is the number of observations in the smallest original data set.
Which program combines Cert.Props1 and Cert.Props2 to produce Work.Props3?
a.
data work.props3;
set cert.props1;
set cert.props2;
run;
b.
data work.props3;
set cert.props1 cert.props2;
run;
c.
data work.props3;
set cert.props1 cert.props2;
by actor;
run;
d.
data work.props3;
merge cert.props1 cert.props2;
by actor;
run;
Correct answer: b
This is a case of concatenation, which requires a list of data set names in the SET statement and one or more BY variables in the BY statement. Notice that observations in each BY group are read sequentially, in the order in which the data sets and BY variables are listed. The new data set contains all the variables from all the input data sets, as well as the total number of records from all input data sets.
If you submit the following program, which new data set is created?
data work.jobsatis;
set work.dataone work.datatwo;
run;
proc print data=work.jobsatis noobs;
run;
A
B
C
D none of the above
Correct answer: a
Concatenating appends the observations from one data set to another data set. The new data set contains the total number of records from all input data sets, so b is incorrect. All the variables from all the input data sets appear in the new data set, so c is incorrect.
If you concatenate the data sets below in the order shown, what is the value of Sale in observation 2 of the new data set?
a.
missing
b.
$30,000
c.
$40,000
d.
You cannot concatenate these data sets.
Correct answer: a
The concatenated data sets are read sequentially, in the order in which they are listed in the SET statement. The second observation in Work.Reps does not contain a value for Sale, so a missing value appears for this variable. (Note that if you merge the data sets, the value of Sale for the second observation is $30,000.)
What happens if you merge the following data sets by the variable SSN?
a.
The values of Age in the data set 1st overwrite the values of Age in the data set 2nd.
b.
The values of Age in the data set 2nd overwrite the values of Age in the data set 1st.
c.
The DATA step fails because the two data sets contain same-named variables that have different values.
d.
The values of Age in the data set 2nd are set to missing.
Correct answer: b
If you have variables with the same name in more than one input data set, values of the same-named variable in the first data set in which it appears are overwritten by values of the same-named variable in subsequent data sets.
Suppose you merge data sets Cert.Set1 and Cert.Set2 below:
Which output does the following program create?
data work.merged;
merge cert.set1(in=in1) cert.set2(in=in2);
by id;
if in1 and in2;
run;
proc print data=work.merged;
run;
Correct answer: a
The DATA step uses the IN= data set option, and the subsetting IF statement excludes unmatched observations from the output data set. So, answers a and b, which contain unmatched observations, are incorrect.
The data sets Cert.Spring and Cert.Sum both contain a variable named Blue. How do you prevent the values of the variable Blue from being overwritten when you merge the two data sets?
a.
data work.merged;
merge cert.spring(in=blue)
cert.summer;
by fabric;
run;
b.
data work.merged;
merge cert.spring(out=blue)
cert.summer;
run;
c.
data work.merged;
merge cert.spring(blue=navy)
cert.summer;
by fabric;
run;
d.
data work.merged;
merge cert.spring(rename=(blue=navy))
cert.summer;
by fabric;
run;
Correct answer: d
Match-merging overwrites same-named variables in the first data set with same-named variables in subsequent data sets. To prevent overwriting, rename variables by using the RENAME= data set option in the MERGE statement.
What happens if you submit the following program to merge Cert.Donors1 and Cert.Donors2, shown below?
data work.merged;
merge cert.donors1 cert.donors2;
by id;
run;
a.
The merged data set contains some missing values because not all observations have matching observations in the other data set.
b.
The merged data set contains eight observations.
c.
The DATA step produces errors.
d.
Values for Units in Cert.Donors2 overwrite values of Units in Cert.Donors1.
Correct answer: c
The two input data sets are not sorted by values of the BY variable, so the DATA step produces errors and stops processing.
If you merge Cert.Staff1 and Cert.Staff2 below by ID, how many observations does the new data set contain?
a.
4
b.
5
c.
6
d.
9
Correct answer: c
In this example, the new data set contains one observation for each unique value of ID. The new data set is shown below.
If you merge data sets Work.Reps, Work.Close, and Work.Bonus by ID, what is the value of Bonus in the third observation in the new data set?
a.
$4,000
b.
$3,000
c.
missing
d.
You cannot tell from the information given.
Correct answer: a
In the new data set, the third observation is the second observation for ID number 2 (Kelly Windsor). The value for Bonus is retained from the previous observation because the BY variable value did not change. The new data set is shown below.