Hodgepodge Flashcards
SUM statement
- variable* + expression
- variable =* accumulator variable. This variable must be numeric. The variable is automatically set to 0 at compile time before the first observation is read. The variable’s value is retained from one DATA step excecution to the next.
Useful when want to create a variable that accumulates the values of another variable.
In a sum statement, the accumulator variable is automatically initialized to 0. What if you want to initialized the accumulator variable to a different number?
Use RETAIN statement.
RETAIN variable (initial-value);
5 facts about RETAIN statement
- assigns an inital value to a retained variable
- prevents variables from being initialized each time the DATA step executes
- is a compile-time only statement that creates variables if they do not already exist
- initializes the retained variable to missing before the first execution of the DATA step if you do not supply an initial value
- has no effect on variables that are read with SET, MERGE, or UPDATE statements
Which statement stops the processing of the current observation?
DELETE statement
IF expression THEN DELETE;
SELECT statement
SELECT (select-expression);
WHEN (expression) statement;
WHEN (expression2) statement2;
(OTHERWISE statement3);
END;
Note: select-expression evaluates to a single value.
Example:
select (a);
when (1) x=x*10;
when (3,4,5) x=x*100;
otherwise;
end;
This means: when variable a is 1, x is multiplied by 10. When a is 3,4, or 5, x is multiplied by 100. When a is any other value, nothing happens.
What happens if the result of all SELECT-WHEN comparisions is false and no OTHERWISE statement is present?
SAS issues an error message and stops executing the DATA step.
select;
when (toy=”Bear” and month in (‘OCT’, ‘NOV’, ‘DEC’)) price = 45.00;
when (toy=”Bear” and month in (‘JAN’, ‘FEB’)) price = 25.00;
when (toy=”Bear”) price = 35.00;
otherwise;
end;
What is the price when month is FEB?
25.00
If more than one WHEN statement has a true when-expression, only the first WHEN statement is used. Once a when-expression is true, no other when-expressions are evaluated.
Which statement can be used to write a message to the log?
PUT statement
PUT specification(s);
Specifications:
- character string, e.g. ‘MY NOTE’
- one or more data set variables (it will output the variable value)
- the automatic variables _N_ and _ERROR_
- the automatic variable _ALL_
- and others
What happens if IF is not followed by THEN?
e.g. IF x=2;
If the expression is true, SAS continues processing observation. If it’s false, it will stop and return to top of data step. i.e. only observations with x=2 are output to the data set.
What two temporary variables are created when you use the BY statement with the SET statement?
FIRST.variable
LAST.variable
where variable = the BY variable
What will be printed?
data company.budget (keep=dept payroll);
set work.temp;
by dept;
if first.dept then payroll=0;
payroll+yearly;
if last.dept;
run;
proc print data=company.budget;
sum payroll;
run;
The total payroll for each department, and the grand sum for payroll.
payroll acts as an accumulator variable that will yield the total of payroll+yearly for each department
What happens to the first.variable and last.variable when you specify multiple BY variables?
A change in the value of a primary BY variable forces the LAST.secondayvariable = 1, i.e. it forces the last observation for the secondary variable
What are the values for first.variable and last.variable?
FIRST.variable = 1 for the first observation in a BY group
FIRST.variable = 0 for any other observation in a BY group
LAST.variable = 1 for the lastobservation in a BY group
LAST.variable = 0 for any other observation in a BY group
How do you access an observation directly (without having to process each observation that precedes it);
Use POINT=variable
variable is a temporary numeric variable that contains the observation number of the observation to access
data work.getobs5;
obsnum=5;
set company.usa POINT=obsnum;
OUTPUT;
STOP;
run;
Note that you need the OUTPUT statement because STOP statement immediately stops processing before the end of the DATA step (when it would normally output).
Note that you also need STOP statement or you get continuous looping.
OUTPUT statement and 3 facts about it
OUTPUT dataset1 dataset2;
- Overrides default way in which the DATA step writes obs to output, so obs are only added when the explicit OUTPUT statement is executed
- All data sets specified in OUTPUT statement must also appear in the DATA statement
- Using OUTPUT statement without a following data set name causes the current observation to be written to all data sets that are named in the DATA statement
How to detect the end of a data set?
END=variable
variable is temporary variable that serves as end-of-file marker
data work.addtoend;
set sasuser.stress2 end=last;
TotalTime=totalmin*60+totalsec;
if last;
run;
proc print data=work.addtoend;
run;
**This displays only one observation - the grand total of the accumulator variable TotalTime
What is a difference in the processing of a raw data file vs. a SAS data file?
Raw data file - SAS sets the value of each variable in the DATA step to missing at the beginning of each iteration
SAS data file - while reading an existing data set with the SET statement, SAS retains the values of existing variables (and variables created by a sum statement) from one observation to the next
When SAS reads a raw data file, SAS sets the value of each variable in the DATA step to missing at the beginning of each iteration EXCEPT in these 5 cases:
- variables names in a RETAIN statement
- variables created in a sum statement
- data elements in a _TEMPORARY_ array
- any variables created by using options in the FILE or INFILE statements
- automatic variables
When does automatic character-to-numeric converstion occur? (4 cases)
When a character value is
- assigned to a previoulsy defined numeric variable, e.g. rate=payrate where rate is a numeric variable
- used in an arithmetic expression, e.g. salary=payrate*hours
- compared to a numeric value, using a comparison operator, e.g. if payrate>=rate
- specified in a function that requires numeric arguments, e.g. NewRate=sum(payrate,raise)
True or False: does automatic character-to-numeric conversion occur with WHERE statement comparisons? e.g. where character = 4
No, the program stops running
How do you convert character data values to numeric?
Use INPUT function: INPUT (source, informat)
- source* = character variable, constant, or expression to be converted to a numeric value
- informat* must be numeric
how do you concatenate character strings?
Use concatenation operator ||
e.g. assignment=site || ‘/’ || dept will output site/dept