3: Exploring and Validating Data Flashcards
Based on the following program and data, how many rows will be included in the payment table?
proc sort data=payment dupout=dups nodupkey; by ID; run; ID Amount A $997.54 A $833.88 B $879.05 C $894.77 C $894.77 C $998.26 a. 1 b. 3 c. 5 d. 6
B
The NODUPKEY option keeps the first row for each unique value of ID, which includes A, B and C.
Which of the following FORMAT statements was used to create this output?
Obs Order_ID Order_Date Delivery_Date
1 1230058123 11JAN07 01/11/07
2 1230080101 15JAN07 01/19/07
3 1230106883 20JAN07 01/22/07
4 1230147441 28JAN07 01/28/07
5 1230315085 27FEB07 02/27/07
a. format Order_Date date9. Delivery_Date mmddyy8.; b. format Order_Date date7. Delivery_Date mmddyy8.; c. format Order_Date ddmmmyy. Delivery_Date mmddyy8.; d. format Order_Date monyy7. Delivery_Date mmddyy8.;
B
The DATE7. format displays a two-digit day, three-letter month abbreviation, and two-digit year. The MMDDYY8. format displays a two-digit month, day, and year, separated by slashes.
The format name must include a period delimiter in the FORMAT statement.
a. True b. False
A
The period is a required syntax element in a format name within a FORMAT statement.
Which row or rows will be selected by the following WHERE statement?
where Job_Title like “Sales%”;
Obs Last_Name First_Name Country Job_Title
1 Wu Christine AU Sales Rep I
2 Stone Kimiko AU Sales Manager
3 Hoffman Fred AU Insurance Sales
a. row 1 b. row 2 c. row 3 d. rows 1 and 2 e. all rows
D
This WHERE statement returns rows that contain Sales with any number of additional characters after Sales because of the position of the percent sign.
Which statement about this PROC SORT step is true? proc sort data=orion.staff; out=work.staff; by descending Salary Manager_ID; run;
a. The sorted table overwrites the input table. b. The rows are sorted by Salary in descending order, and then by Manager_ID in descending order. c. A semicolon should not appear after the input data set name. d. The sorted table contains only the columns specified in the BY statement.
C
This PROC SORT step has a syntax error: a semicolon in the middle of the PROC SORT statement. If you correct this syntax error, this step sorts orion.staff by Salary in descending order and by Manager_ID in ascending order. The step then creates the temporary data set staff that contains the sorted rows and all columns.
Which of the following statements selects from a table only those rows where the value of the column Style is RANCH, SPLIT, or TWOSTORY?
a. where Style='RANCH' or 'SPLIT' or 'TWOSTORY'; b. where Style in 'RANCH' or 'SPLIT' or 'TWOSTORY'; c. where Style in (RANCH, SPLIT, TWOSTORY); d. where Style in ('RANCH', 'SPLIT', 'TWOSTORY');
D
In the WHERE statement, the IN operator enables you to select rows based on several values. You specify values in parentheses and separate them with spaces or commas. Character values must be enclosed in quotation marks and must be in the same case as in the data set.
Which of the following statements selects rows in which Amount is less than or equal to $5,000 or Rate equals 0.095?
a. where amount <= 5000 or rate=0.095; b. where amount le 5000 or rate=0.095; c. where amount <= 5000 or rate eq 0.095; d. all of the above
D
All of the statements shown here select rows in which Amount is less than or equal to $5000 or Rate equals 0.095.
Which statement creates the macro variable flower and assigns the value Plumeria?
a. %let flower=Plumeria; b. %let flower="Plumeria"; c. %let &flower=Plumeria; d. %let &flower="Plumeria";
A
In the %LET statement, the name of the macro variable is followed by an equal sign and the unquoted value. The ampersand is added when you use the macro variable.
Which statement in a PROC MEANS step lets you specify the numeric columns to analyze?
a. TABLES b. VARS c. VAR d. KEEP=
C
You use the VAR statement to specify the numeric columns to analyze in PROC MEANS. If you don’t specify the VAR statement, all numeric columns are analyzed.
Suppose you have a table that includes flower sales to all your retail outlets. You want to see the distinct values of Flower_Type with a count and percentage for each. Which procedure would you use?
a. PRINT b. MEANS c. UNIVARIATE d. FREQ
D
PROC FREQ output includes the distinct values for the column, as well as a frequency count, percent, cumulative frequency, and cumulative percent.