Hodgepodge Flashcards
SUM statement
- variable* + expression
- variable =* accumulator variable. This variable must be numeric. The variable is automatically set to 0 at compile time before the first observation is read. The variable’s value is retained from one DATA step excecution to the next.
Useful when want to create a variable that accumulates the values of another variable.
In a sum statement, the accumulator variable is automatically initialized to 0. What if you want to initialized the accumulator variable to a different number?
Use RETAIN statement.
RETAIN variable (initial-value);
5 facts about RETAIN statement
- assigns an inital value to a retained variable
- prevents variables from being initialized each time the DATA step executes
- is a compile-time only statement that creates variables if they do not already exist
- initializes the retained variable to missing before the first execution of the DATA step if you do not supply an initial value
- has no effect on variables that are read with SET, MERGE, or UPDATE statements
Which statement stops the processing of the current observation?
DELETE statement
IF expression THEN DELETE;
SELECT statement
SELECT (select-expression);
WHEN (expression) statement;
WHEN (expression2) statement2;
(OTHERWISE statement3);
END;
Note: select-expression evaluates to a single value.
Example:
select (a);
when (1) x=x*10;
when (3,4,5) x=x*100;
otherwise;
end;
This means: when variable a is 1, x is multiplied by 10. When a is 3,4, or 5, x is multiplied by 100. When a is any other value, nothing happens.
What happens if the result of all SELECT-WHEN comparisions is false and no OTHERWISE statement is present?
SAS issues an error message and stops executing the DATA step.
select;
when (toy=”Bear” and month in (‘OCT’, ‘NOV’, ‘DEC’)) price = 45.00;
when (toy=”Bear” and month in (‘JAN’, ‘FEB’)) price = 25.00;
when (toy=”Bear”) price = 35.00;
otherwise;
end;
What is the price when month is FEB?
25.00
If more than one WHEN statement has a true when-expression, only the first WHEN statement is used. Once a when-expression is true, no other when-expressions are evaluated.
Which statement can be used to write a message to the log?
PUT statement
PUT specification(s);
Specifications:
- character string, e.g. ‘MY NOTE’
- one or more data set variables (it will output the variable value)
- the automatic variables _N_ and _ERROR_
- the automatic variable _ALL_
- and others
What happens if IF is not followed by THEN?
e.g. IF x=2;
If the expression is true, SAS continues processing observation. If it’s false, it will stop and return to top of data step. i.e. only observations with x=2 are output to the data set.
What two temporary variables are created when you use the BY statement with the SET statement?
FIRST.variable
LAST.variable
where variable = the BY variable
What will be printed?
data company.budget (keep=dept payroll);
set work.temp;
by dept;
if first.dept then payroll=0;
payroll+yearly;
if last.dept;
run;
proc print data=company.budget;
sum payroll;
run;
The total payroll for each department, and the grand sum for payroll.
payroll acts as an accumulator variable that will yield the total of payroll+yearly for each department
What happens to the first.variable and last.variable when you specify multiple BY variables?
A change in the value of a primary BY variable forces the LAST.secondayvariable = 1, i.e. it forces the last observation for the secondary variable
What are the values for first.variable and last.variable?
FIRST.variable = 1 for the first observation in a BY group
FIRST.variable = 0 for any other observation in a BY group
LAST.variable = 1 for the lastobservation in a BY group
LAST.variable = 0 for any other observation in a BY group
How do you access an observation directly (without having to process each observation that precedes it);
Use POINT=variable
variable is a temporary numeric variable that contains the observation number of the observation to access
data work.getobs5;
obsnum=5;
set company.usa POINT=obsnum;
OUTPUT;
STOP;
run;
Note that you need the OUTPUT statement because STOP statement immediately stops processing before the end of the DATA step (when it would normally output).
Note that you also need STOP statement or you get continuous looping.
OUTPUT statement and 3 facts about it
OUTPUT dataset1 dataset2;
- Overrides default way in which the DATA step writes obs to output, so obs are only added when the explicit OUTPUT statement is executed
- All data sets specified in OUTPUT statement must also appear in the DATA statement
- Using OUTPUT statement without a following data set name causes the current observation to be written to all data sets that are named in the DATA statement
How to detect the end of a data set?
END=variable
variable is temporary variable that serves as end-of-file marker
data work.addtoend;
set sasuser.stress2 end=last;
TotalTime=totalmin*60+totalsec;
if last;
run;
proc print data=work.addtoend;
run;
**This displays only one observation - the grand total of the accumulator variable TotalTime
What is a difference in the processing of a raw data file vs. a SAS data file?
Raw data file - SAS sets the value of each variable in the DATA step to missing at the beginning of each iteration
SAS data file - while reading an existing data set with the SET statement, SAS retains the values of existing variables (and variables created by a sum statement) from one observation to the next
When SAS reads a raw data file, SAS sets the value of each variable in the DATA step to missing at the beginning of each iteration EXCEPT in these 5 cases:
- variables names in a RETAIN statement
- variables created in a sum statement
- data elements in a _TEMPORARY_ array
- any variables created by using options in the FILE or INFILE statements
- automatic variables
When does automatic character-to-numeric converstion occur? (4 cases)
When a character value is
- assigned to a previoulsy defined numeric variable, e.g. rate=payrate where rate is a numeric variable
- used in an arithmetic expression, e.g. salary=payrate*hours
- compared to a numeric value, using a comparison operator, e.g. if payrate>=rate
- specified in a function that requires numeric arguments, e.g. NewRate=sum(payrate,raise)
True or False: does automatic character-to-numeric conversion occur with WHERE statement comparisons? e.g. where character = 4
No, the program stops running
How do you convert character data values to numeric?
Use INPUT function: INPUT (source, informat)
- source* = character variable, constant, or expression to be converted to a numeric value
- informat* must be numeric
how do you concatenate character strings?
Use concatenation operator ||
e.g. assignment=site || ‘/’ || dept will output site/dept
How do you convert numeric data values to character?
use PUT function: PUT(source,format)
source = numeric variable, constant, or expression to be converted to a character value
Need to specify a numeric format
e.g. put(site, 2.)
Which function returns a specified word from a character string?
SCAN function
SCAN(argument, n, [‘delimiters’])
- argument* = character variable or expression to scan
- n =* which number word to return
- delimiters =* what separate the words. If none specified, default delimiters are used
When is it best to use a SCAN function over a SUBSTR function?
When:
- you know the order of the words in the character string
- the starting position of the words variaes
- the words are separated by some delimiter
Which function can you use to extract a substring or replace character values?
SUBSTR function
SUBSTR(argument, position, (n))
Extracting a substring: newvariable=SUBSTR()
- argument =* the character variable or expression to scan
- position* = the character position to start from
- n* = the number of characters to extract. If n is omitted, all remaining characters are included in the substring
Replacing characters: SUBSTR()=’replacementsubstring’
- argument =* the character variable or expression to modify
- position* = the position where replacement will begin
- n* = the number of characters to replace. If n is omitted, all remaining characters are replaced
What function can you use to trim trailing blanks from character values?
TRIM function
TRIM(argument)
argument = a character expression
Which function concatenates character strings, removes leading and trailing blanks, and inserts separators?
CATX function
CATX(separator, string1, string2)
separator = character string that is used as a separator between concatenated strings
How can you search a charcter value for a specific string?
use INDEX function
INDEX(source, ‘excerpt’)
- source* = character variable or expression to search
- excerpt* = character string you’re looking for
The INDEX function returns the position of the excerpt’s first character. If it does not find the excerpt, it returns a value of 0.
Which function works similarly to INDEX function?
FIND function
FIND(string, substring, (modifiers), (startpos))
modifiers: i causes the FIND function to ignore character case during search. t trims trailing blanks from string and substring
Which function replaces or removes all occurrences of a pattern of characters within a character string?
TRANWRD function
TRANWRD(source, target, replacement)
- source* = source string that you want to translate
- target* = string that SAS searches for in source
- replacement* = string that replaces target
Which functions give a length of 200 bytes to their variables?
TRANWRD, CATX
How does the SCAN function treat consecutive delimiters?
as one delimiter
DO loop general form
DO index-variable=start TO stop BY increment;
SAS statements
END;
Note: if you omit increment, the default is 1
In a DO loop, you normally just get one observation with the value of the accumulator variable. How do you print observations for every iteration of a DO loop?
Add OUTPUT before END
DO index-variable=start to end by increment;
SAS statements;
OUTPUT;
END;
DO UNTIL statement and DO WHILE statement
DO UNTIL (condition);
sas statements;
END;
DO WHILE (condition);
sas statements;
END;
Note that condition must be in parenthesis
Create a sample data set: sample every tenth observation if the 5,000 obs in Factory.Widgets.
data work.subset;
do sample=10 to 5000 by 10;
set factory.widgets POINT=sample;
output;
end;
stop;
run;
Notes:
- “sample” is the index variable. POINT=index-variable
- need STOP statement to prevent continuous looping when using the POINT= option.
- Need OUTPUT statement inside the DO loop to output each obs selected
ARRAY statement
ARRAY array-name{dimension} elements;
e.g. array sales{4} qtr1 qtr2 qtr3 qtr4;
Example with DO loop:
data work.report;
set master.class;
array wt{6} w1-w6;
do day=1 to 6;
wt{day}=wt{day}*2.2;
end;
run;
What happens if you use * to specify the dimension of an array?
e.g. array sales{*} qtr1 qtr2 qtr3 qrt4
SAS determines the dimension of the array by counting the number of elements
What symbols can you use to enclose the dimension of an array?
( ) , { } , []
Which function returns the number of elements in an array?
DIM(array-name)
Example:
data my.data;
set my.fitclass;
array wt{*} weight1-weight6;
do i=1 to DIM(wt);
wt{i}=wt{i}*2.2;
end;
run;
How do you create variables in an ARRAY statement?
You omit the array elements from the statement:
ARRAY array-name{dimension}
SAS will create new variables with default names: the array-name followed by consecutive numbers 1 to the dimension of the array
or you can also name the new variables as elements
How to add initial values to an array when creating new variables?
array array-name{dimension} element1 element2 (initialvalue1 initial value2)
How do you create a temporary array?
array array-name{dimension} _TEMPORARY_
What do the trailing at sign (@) and double trailing at sign (@@) do?
@: holds the input record for the execution of the next INPUT statement
@@: holds the input record for the execution of the next INPUT statement, even across iterations of the DATA step
When is a record held by @@ released? (2 cases)
- The input pointer moves past the end of the record - then the pointer will move down to the next record
- an INPUT statement that has no trailing sign executes
What are the difference between:
FLOWOVER
MISSOVER
TRUNCOVER
STOPOVER
?
All are used when reading raw data files.
FLOWOVER is the default.
MISSOVER sets vars to missing when there are missing values at end of a record. If you use it with column input, it can set variables that are shorter than the specification in the input statement to missing
TRUNCOVER forces input statement to stop reading when it gets to the end of a short line. Works similar to MISSOVER, but doesn’t have the column input problem
STOPOVER stops the data step when it reads a short line, e.g. there are missing values at end of record
What does the forward slash (/) line pointer control do?
It allows you to read multiple records sequentially. Useful if one observation is spread out over several lines in raw data file
e.g.
input Lname $ 1-8 Fname$ 10-15 /
Department $ 1-12 JobCode $15-19 /
Salary comma10.;
Note: that the raw data file contains the same number of records for each observation
What does the #n line pointer control do?
1 Lname $ Fname $;
It specifies which line to read and allows you to read records non-sequentially
e.g.
input #2 Department $ 1-12 JobCode $ 15-19
This program will read the values of department and jobcode starting at the second record and will go back to first record for lname and fname
When is the ampersand (&) modifier used?
In modified list input.
It’s used to read character values that contain one or more single embedded blanks. The value is read until two or more consecutive blanks are encountered.
How do informats work differently in formatted input vs. modified list input?
In formatted input, the informat determines both the length of character variables and the number of columns that are read.
In modified list input, the informat only determines the length of the variable, not the number of columns read
What is the minimum width for the informat TIMEw.?
5 - anything less will give you an error message
WEEKDATEw.
WORDDATEw.
Mon, Apr 5, 1999
Apr 5, 1999
What does VARNUM option do in PROC CONTENTS?
lists the variables in the order of creation
Which option suppresses page numbers in LISTING output?
Which option suppresses the date?
OPTIONS NONUMBER;
OPTIONS NODATE;
Note: OPTIONS statement is a global statement and will remain in effect until modified or end SAS session
Define:
PAGENO= option
PAGESIZE= option
LINESIZE= option
All used with LISTING output
PAGENO identifies the number to appear on the first page of the document
PAGESIZE specifies how many lines each page of output contains
LINESIZE (or LS) specifies the width of the print line for the output and log
YEARCUTOFF= option
It specifies the first year of the 100-year span that SAS uses for dates
e.g.
YEARCUTOFF=1950 spans 100 years from 1950 to 2049
Default yearcutoff = 1920 so from 1920 to 2019
which means that two digit years like ‘55 will come out as 1955