Hodgepodge Flashcards

1
Q

SUM statement

A
  • variable* + expression
  • variable =* accumulator variable. This variable must be numeric. The variable is automatically set to 0 at compile time before the first observation is read. The variable’s value is retained from one DATA step excecution to the next.

Useful when want to create a variable that accumulates the values of another variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In a sum statement, the accumulator variable is automatically initialized to 0. What if you want to initialized the accumulator variable to a different number?

A

Use RETAIN statement.

RETAIN variable (initial-value);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

5 facts about RETAIN statement

A
  1. assigns an inital value to a retained variable
  2. prevents variables from being initialized each time the DATA step executes
  3. is a compile-time only statement that creates variables if they do not already exist
  4. initializes the retained variable to missing before the first execution of the DATA step if you do not supply an initial value
  5. has no effect on variables that are read with SET, MERGE, or UPDATE statements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which statement stops the processing of the current observation?

A

DELETE statement

IF expression THEN DELETE;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

SELECT statement

A

SELECT (select-expression);

WHEN (expression) statement;

WHEN (expression2) statement2;

(OTHERWISE statement3);

END;

Note: select-expression evaluates to a single value.

Example:

select (a);

when (1) x=x*10;

when (3,4,5) x=x*100;

otherwise;

end;

This means: when variable a is 1, x is multiplied by 10. When a is 3,4, or 5, x is multiplied by 100. When a is any other value, nothing happens.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What happens if the result of all SELECT-WHEN comparisions is false and no OTHERWISE statement is present?

A

SAS issues an error message and stops executing the DATA step.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

select;

when (toy=”Bear” and month in (‘OCT’, ‘NOV’, ‘DEC’)) price = 45.00;

when (toy=”Bear” and month in (‘JAN’, ‘FEB’)) price = 25.00;

when (toy=”Bear”) price = 35.00;

otherwise;

end;

What is the price when month is FEB?

A

25.00

If more than one WHEN statement has a true when-expression, only the first WHEN statement is used. Once a when-expression is true, no other when-expressions are evaluated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which statement can be used to write a message to the log?

A

PUT statement

PUT specification(s);

Specifications:

  • character string, e.g. ‘MY NOTE’
  • one or more data set variables (it will output the variable value)
  • the automatic variables _N_ and _ERROR_
  • the automatic variable _ALL_
  • and others
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What happens if IF is not followed by THEN?

e.g. IF x=2;

A

If the expression is true, SAS continues processing observation. If it’s false, it will stop and return to top of data step. i.e. only observations with x=2 are output to the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What two temporary variables are created when you use the BY statement with the SET statement?

A

FIRST.variable

LAST.variable

where variable = the BY variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What will be printed?

data company.budget (keep=dept payroll);

set work.temp;

by dept;

if first.dept then payroll=0;

payroll+yearly;

if last.dept;

run;

proc print data=company.budget;

sum payroll;

run;

A

The total payroll for each department, and the grand sum for payroll.

payroll acts as an accumulator variable that will yield the total of payroll+yearly for each department

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What happens to the first.variable and last.variable when you specify multiple BY variables?

A

A change in the value of a primary BY variable forces the LAST.secondayvariable = 1, i.e. it forces the last observation for the secondary variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the values for first.variable and last.variable?

A

FIRST.variable = 1 for the first observation in a BY group

FIRST.variable = 0 for any other observation in a BY group

LAST.variable = 1 for the lastobservation in a BY group

LAST.variable = 0 for any other observation in a BY group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you access an observation directly (without having to process each observation that precedes it);

A

Use POINT=variable

variable is a temporary numeric variable that contains the observation number of the observation to access

data work.getobs5;

obsnum=5;

set company.usa POINT=obsnum;

OUTPUT;

STOP;

run;

Note that you need the OUTPUT statement because STOP statement immediately stops processing before the end of the DATA step (when it would normally output).

Note that you also need STOP statement or you get continuous looping.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

OUTPUT statement and 3 facts about it

A

OUTPUT dataset1 dataset2;

  1. Overrides default way in which the DATA step writes obs to output, so obs are only added when the explicit OUTPUT statement is executed
  2. All data sets specified in OUTPUT statement must also appear in the DATA statement
  3. Using OUTPUT statement without a following data set name causes the current observation to be written to all data sets that are named in the DATA statement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to detect the end of a data set?

A

END=variable

variable is temporary variable that serves as end-of-file marker

data work.addtoend;

set sasuser.stress2 end=last;

TotalTime=totalmin*60+totalsec;

if last;

run;

proc print data=work.addtoend;

run;

**This displays only one observation - the grand total of the accumulator variable TotalTime

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a difference in the processing of a raw data file vs. a SAS data file?

A

Raw data file - SAS sets the value of each variable in the DATA step to missing at the beginning of each iteration

SAS data file - while reading an existing data set with the SET statement, SAS retains the values of existing variables (and variables created by a sum statement) from one observation to the next

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When SAS reads a raw data file, SAS sets the value of each variable in the DATA step to missing at the beginning of each iteration EXCEPT in these 5 cases:

A
  1. variables names in a RETAIN statement
  2. variables created in a sum statement
  3. data elements in a _TEMPORARY_ array
  4. any variables created by using options in the FILE or INFILE statements
  5. automatic variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When does automatic character-to-numeric converstion occur? (4 cases)

A

When a character value is

  1. assigned to a previoulsy defined numeric variable, e.g. rate=payrate where rate is a numeric variable
  2. used in an arithmetic expression, e.g. salary=payrate*hours
  3. compared to a numeric value, using a comparison operator, e.g. if payrate>=rate
  4. specified in a function that requires numeric arguments, e.g. NewRate=sum(payrate,raise)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

True or False: does automatic character-to-numeric conversion occur with WHERE statement comparisons? e.g. where character = 4

A

No, the program stops running

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you convert character data values to numeric?

A

Use INPUT function: INPUT (source, informat)

  • source* = character variable, constant, or expression to be converted to a numeric value
  • informat* must be numeric
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

how do you concatenate character strings?

A

Use concatenation operator ||

e.g. assignment=site || ‘/’ || dept will output site/dept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How do you convert numeric data values to character?

A

use PUT function: PUT(source,format)

source = numeric variable, constant, or expression to be converted to a character value

Need to specify a numeric format

e.g. put(site, 2.)

24
Q

Which function returns a specified word from a character string?

A

SCAN function

SCAN(argument, n, [‘delimiters’])

  • argument* = character variable or expression to scan
  • n =* which number word to return
  • delimiters =* what separate the words. If none specified, default delimiters are used
25
Q

When is it best to use a SCAN function over a SUBSTR function?

A

When:

  1. you know the order of the words in the character string
  2. the starting position of the words variaes
  3. the words are separated by some delimiter
26
Q

Which function can you use to extract a substring or replace character values?

A

SUBSTR function

SUBSTR(argument, position, (n))

Extracting a substring: newvariable=SUBSTR()

  • argument =* the character variable or expression to scan
  • position* = the character position to start from
  • n* = the number of characters to extract. If n is omitted, all remaining characters are included in the substring

Replacing characters: SUBSTR()=’replacementsubstring’

  • argument =* the character variable or expression to modify
  • position* = the position where replacement will begin
  • n* = the number of characters to replace. If n is omitted, all remaining characters are replaced
27
Q

What function can you use to trim trailing blanks from character values?

A

TRIM function

TRIM(argument)

argument = a character expression

28
Q

Which function concatenates character strings, removes leading and trailing blanks, and inserts separators?

A

CATX function

CATX(separator, string1, string2)

separator = character string that is used as a separator between concatenated strings

29
Q

How can you search a charcter value for a specific string?

A

use INDEX function

INDEX(source, ‘excerpt’)

  • source* = character variable or expression to search
  • excerpt* = character string you’re looking for

The INDEX function returns the position of the excerpt’s first character. If it does not find the excerpt, it returns a value of 0.

30
Q

Which function works similarly to INDEX function?

A

FIND function

FIND(string, substring, (modifiers), (startpos))

modifiers: i causes the FIND function to ignore character case during search. t trims trailing blanks from string and substring

31
Q

Which function replaces or removes all occurrences of a pattern of characters within a character string?

A

TRANWRD function

TRANWRD(source, target, replacement)

  • source* = source string that you want to translate
  • target* = string that SAS searches for in source
  • replacement* = string that replaces target
32
Q

Which functions give a length of 200 bytes to their variables?

A

TRANWRD, CATX

33
Q

How does the SCAN function treat consecutive delimiters?

A

as one delimiter

34
Q

DO loop general form

A

DO index-variable=start TO stop BY increment;

SAS statements

END;

Note: if you omit increment, the default is 1

35
Q

In a DO loop, you normally just get one observation with the value of the accumulator variable. How do you print observations for every iteration of a DO loop?

A

Add OUTPUT before END

DO index-variable=start to end by increment;

SAS statements;

OUTPUT;

END;

36
Q

DO UNTIL statement and DO WHILE statement

A

DO UNTIL (condition);

sas statements;

END;

DO WHILE (condition);

sas statements;

END;

Note that condition must be in parenthesis

37
Q

Create a sample data set: sample every tenth observation if the 5,000 obs in Factory.Widgets.

A

data work.subset;

do sample=10 to 5000 by 10;

set factory.widgets POINT=sample;

output;

end;

stop;

run;

Notes:

  • “sample” is the index variable. POINT=index-variable
  • need STOP statement to prevent continuous looping when using the POINT= option.
  • Need OUTPUT statement inside the DO loop to output each obs selected
38
Q

ARRAY statement

A

ARRAY array-name{dimension} elements;

e.g. array sales{4} qtr1 qtr2 qtr3 qtr4;

Example with DO loop:

data work.report;

set master.class;

array wt{6} w1-w6;

do day=1 to 6;

wt{day}=wt{day}*2.2;

end;

run;

39
Q

What happens if you use * to specify the dimension of an array?

e.g. array sales{*} qtr1 qtr2 qtr3 qrt4

A

SAS determines the dimension of the array by counting the number of elements

40
Q

What symbols can you use to enclose the dimension of an array?

A

( ) , { } , []

41
Q

Which function returns the number of elements in an array?

A

DIM(array-name)

Example:

data my.data;

set my.fitclass;

array wt{*} weight1-weight6;

do i=1 to DIM(wt);

wt{i}=wt{i}*2.2;

end;

run;

42
Q

How do you create variables in an ARRAY statement?

A

You omit the array elements from the statement:

ARRAY array-name{dimension}

SAS will create new variables with default names: the array-name followed by consecutive numbers 1 to the dimension of the array

or you can also name the new variables as elements

43
Q

How to add initial values to an array when creating new variables?

A

array array-name{dimension} element1 element2 (initialvalue1 initial value2)

44
Q

How do you create a temporary array?

A

array array-name{dimension} _TEMPORARY_

45
Q

What do the trailing at sign (@) and double trailing at sign (@@) do?

A

@: holds the input record for the execution of the next INPUT statement

@@: holds the input record for the execution of the next INPUT statement, even across iterations of the DATA step

46
Q

When is a record held by @@ released? (2 cases)

A
  1. The input pointer moves past the end of the record - then the pointer will move down to the next record
  2. an INPUT statement that has no trailing sign executes
47
Q

What are the difference between:

FLOWOVER

MISSOVER

TRUNCOVER

STOPOVER

?

A

All are used when reading raw data files.

FLOWOVER is the default.

MISSOVER sets vars to missing when there are missing values at end of a record. If you use it with column input, it can set variables that are shorter than the specification in the input statement to missing

TRUNCOVER forces input statement to stop reading when it gets to the end of a short line. Works similar to MISSOVER, but doesn’t have the column input problem

STOPOVER stops the data step when it reads a short line, e.g. there are missing values at end of record

48
Q

What does the forward slash (/) line pointer control do?

A

It allows you to read multiple records sequentially. Useful if one observation is spread out over several lines in raw data file

e.g.

input Lname $ 1-8 Fname$ 10-15 /

Department $ 1-12 JobCode $15-19 /

Salary comma10.;

Note: that the raw data file contains the same number of records for each observation

49
Q

What does the #n line pointer control do?

A

1 Lname $ Fname $;

It specifies which line to read and allows you to read records non-sequentially

e.g.

input #2 Department $ 1-12 JobCode $ 15-19

This program will read the values of department and jobcode starting at the second record and will go back to first record for lname and fname

50
Q

When is the ampersand (&) modifier used?

A

In modified list input.

It’s used to read character values that contain one or more single embedded blanks. The value is read until two or more consecutive blanks are encountered.

51
Q

How do informats work differently in formatted input vs. modified list input?

A

In formatted input, the informat determines both the length of character variables and the number of columns that are read.

In modified list input, the informat only determines the length of the variable, not the number of columns read

52
Q

What is the minimum width for the informat TIMEw.?

A

5 - anything less will give you an error message

53
Q

WEEKDATEw.

WORDDATEw.

A

Mon, Apr 5, 1999

Apr 5, 1999

54
Q

What does VARNUM option do in PROC CONTENTS?

A

lists the variables in the order of creation

55
Q

Which option suppresses page numbers in LISTING output?

Which option suppresses the date?

A

OPTIONS NONUMBER;

OPTIONS NODATE;

Note: OPTIONS statement is a global statement and will remain in effect until modified or end SAS session

56
Q

Define:

PAGENO= option

PAGESIZE= option

LINESIZE= option

A

All used with LISTING output

PAGENO identifies the number to appear on the first page of the document

PAGESIZE specifies how many lines each page of output contains

LINESIZE (or LS) specifies the width of the print line for the output and log

57
Q

YEARCUTOFF= option

A

It specifies the first year of the 100-year span that SAS uses for dates

e.g.

YEARCUTOFF=1950 spans 100 years from 1950 to 2049

Default yearcutoff = 1920 so from 1920 to 2019

which means that two digit years like ‘55 will come out as 1955