2.2 & 2.3 Flashcards
Which of the following is true regarding the values of computed columns during the execution phase of a DATA step?
a) values of computed columns are recalculated and previous values overwritten for each row in the input data set
b) by default all computed columns are reset to missing when the PDV is reinitialized
c) you cannot use a DATA step to create an accumulating column
b) by default all computed columns are reset to missing when the PDV is reinitialized
Which of the following DATA steps successfully creates an accumulating column for YTDRain? a) data houston2017; set pg2.weather_houston; retain YTDRain 0; YTDRain=YTDRain+DailyRain; run;
b) data houston2017; set pg2.weather_houston; retain YTDRain; YTDRain=YTDRain+DailyRain; run;
c) a) data houston2017; set pg2.weather_houston; YTDRain=YTDRain+DailyRain; retain YTDRain 0; run; d) None of the above. You cannot create an accumulating column in a DATA step.
a) data houston2017; set pg2.weather_houston; retain YTDRain 0; YTDRain=YTDRain+DailyRain; run;
The RETAIN statement is a compile-time statement that sets a rule for one or more columns to keep their value each time the PDV is reinitialized, rather than being reset to missing. It also provides the option of establishing an initial value in the PDV before the first iteration of the DATA step.
RETAIN column ;
Which of the following is NOT true regarding the SUM statement?
a) The SUM statement syntax is column+expression, where the accumulating column is to the left of the + sign
b) The SUM statement automatically sets the initial value of the accumulating column to 0;
c) The RETAIN statement is required in order for the SUM statement to work properly.
d) The SUM statement adds the value of the column or constant to the right of the plus sign to the accumulating column for each row.
e) The SUM statement ignores missing values.
c) The RETAIN statement is required in order for the SUM statement to work properly.
What sum statement would you add to this program to create the column named DayNum, which increments by 1 for each row in the input data set? data zurich2017; set pg2.weather_zurich; YTDRain_mm+Rain_mm; ??? run; a) SUM(DayNum, 1); b) DayNum + 1; c) retain DayNum 1; d) DayNum+Rain_mm;
b) DayNum + 1;
What step is needed in order to process data in groups?
a) RETAIN
b) ORDER BY
c) SORT
d) BY
d) BY
Which of the following is true when processing data in groups in a DATA step?
a) It is not necessary to sort the data by the desired groups
b) Two special columns, FIRST.by-column and LAST.by-column, are added to the PDV.
c) The FIRST. and LAST. variables are permanent and will be added to the output table by default.
d) The FIRST. variable is 1 for the first row within a group, and . for all other rows.
b) Two special columns, FIRST.by-column and LAST.by-column, are added to the PDV.
During the execution phase, the FIRST. and LAST. variables are assigned a value of 0 or 1. The FIRST. variable is 1 for the first row within a group, and 0 for all other rows. Similarly, the LAST. variable is 1 for the last row within a group, and 0 for all other rows.
These temporary variables contain important information that you can use before they are dropped when a row is written to the output table.
True/False - You can use the FIRST. and LAST. variables, along with the BY and WHERE statements to subset rows during the execution phase of the DATA step.
False - The WHERE statement is a compile-time statement that establishes rules about which rows are read INTO the PDV. Therefore, the WHERE express must be based on columns that exist in the input table referenced in the SET statement. The FIRST. and LAST. variables in not in the input table.
True/False - If multiple columns are listed on the BY statement in the DATA step, then each column has its own FIRST./LAST. variables in the PDV.
True
Summarizing data within groups can be performed in the DATA step or in procedures such as PROC MEANS. What are some examples of when you might choose to use either the DATA step or PROC MEANS?
a) The DATA step enables you to do other calculations or manipulations at the same time summarizations occur.
b) PROC means is more complex to code, but offers more statistics
c) The DATA step is better for very large data sets.
d) Both are equivalent and there use depends on personal preference.
a) The DATA step enables you to do other calculations or manipulations at the same time summarizations occur.
b - PROC MEANS might be simpler to code, and it is easy to request various statistics.
Which of the following statements about SAS functions are true? List all that apply.
a) SAS functions are named, predefined processes which can be used to produce a value
b) A function must include at least 1 argument as input
c) Based on the arguments, the function performs its specified computation or manipulation and returns a value.
d) In the SAS documentation, functions and call routines are grouped by category.
a, c, d
b - The function can accept none, one, or several arguments as input.
True/False - Column lists can help you reduce the number of columns you have to specify in function arguments or in other SAS statements.
True
Suppose you have a data set with numeric columns: Quiz1, Quiz2, Quiz3, Quiz4, and Quiz5. You want to write a data step that will calculate the average of these columns and format all numeric columns in the data set as 3.1. Which data step below will accomplish this? List all that apply. a) data quiz_summary; set pg2.class_quiz; AvgQuiz = mean(Q:); format Q: 3.1; run; b) data quiz_summary; set pg2.class_quiz; AvgQuiz = mean(of Q:); format of Quiz1-AvgQuiz 3.1; run; c) data quiz_summary; set pg2.class_quiz; AvgQuiz = mean(of Q:); format Quiz1--AvgQuiz 3.1; run; d) data quiz_summary; set pg2.class_quiz; AvgQuiz = mean(of Q:); format _numeric_ 3.1; run;
c, d
b) You don’t need to use the OF keyword in the FORMAT statement. The OF keyword is required when you use column lists as arguments in a function or call routine.
Which of the following are keywords that can be used to specify groups of columns and eliminate the need to write them all out? a) _NUMERIC_ b _CHARACTER_ c) _ALL_ d) _NONE_ e) a, b and c
e
Which of the following is true regarding the CALL SORTN routine?
a) The routine takes the columns provided as arguments and reorders them according to the numeric values in the rows.
b) The routine takes the columns provided as arguments, and reorders the numeric values for each row from low to high.
c) Using CALL before the SORTN is optional.
d) The routine assigns the lowest value to a new column.
b)
What SAS function could you use to assign a random number to each record in a new variable?
a) RAND
b) RANDOM
c) RANGE
d) LARGEST
a) RAND
True/False - The LARGEST function will identify the value from the provided arguments with the highest value.
False - The LARGEST function returns the k-th largest value with k being the first argument provided in the function.
LARGEST(k, value-1)
value - specifies the numeric constant, variable, or expression to be processed.
Given the data set and data step below, what will the LARGEST function return?
Quiz1 Quiz2 Quiz3 Quiz4 Quiz5
1 2 3 4 5
1 6 7 4 5
data quiz_analysis;
set pg2.class_quiz;
Quiz1st = largest(1, of Quiz1-Quiz5);
run;
a) 7
b) 5 for the first observation, 7 for the second observation
c) 7 for both observations
d) The code will error because the second argument is not specified correctly
b) The largest function will get the maxiumum score from the columns Quiz1 through Quiz5.
Which of the following statements will find the average value of Quiz1, Quiz2, and Quiz3 and round the result to the nearest tenth?
a) round(mean(Quiz1, Quiz2, Quiz3), .1);
b) round(mean(Quiz1, Quiz2, Quiz3));
c) mean(round(Quiz1, Quiz2, Quiz3), .1);
a)
What will this function return:
CEIL(1.99999)
a) 2
b) 1
c) 1.9
d) 1.99999
a) 2
What will this function return: FLOOR(1.99999) a) 2 b) 1 c) 1.9 d) 1.99999
1)
What will this function return: FLOOR(1.99999) a) 2 b) 1 c) 1.9 d) 1.99999
b) 1
What will this function return: INT(1.9999) a) 2 b) 1 c) 1.9 d) 1.9999
b) 1
True/False - a datetime value in SAS is stored as the number of seconds from midnight on January 1, 1960.
True
What arguments can the INTCK function take?
a) (‘interval’, start-date, end-date )
b) (interval, start-date, end-date, ‘method’)
c) (start-date, end-date, ‘interval’, ‘method’)
d) none of the above
a) (‘interval’, start-date, end-date )