SAS P1 L6 Flashcards

1
Q

What can a DATA step read? (3)

A

DATA step can read:

  • SAS Data sets
  • Excel worksheets
  • Raw data files
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How create new data set from existing data set?
What statements needed?
Syntax?

A

Use DATA step:
DATA output-SAS-data-set; {this is DATA statement}
SET input-SAS-data-set; {this is SET statement}
RUN; {Run statement}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a DATA step do?

How select only particular observations?

A
  • Data step reads all observations and all variables from input data step sequentially
  • Use WHERE statement to subset (select only certain) observations that meet particular condition
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
How write SAS DATE CONSTANT?
What must be used?
What happens? How many digit year?
Examples: how write
1/1/2000?
31/12/11?
1/4/04?
When can be used?
A
SAS DATE CONSTANT
written as 'ddmmmyy'D
- must use quotes
- will be converted to SAS date value
-  used if want 4 digit year
- D can be upper or lower case
Examples:   '01JAN2000'D
                    '31Dec11'D
                    '1jan04'd

Can be used in any SAS expression, including WHERE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
How assign value to variable?
Syntax?
Does var have to be old or new?
What is expression?
Keyword?
Possible operands?
Possible operators?
Operator hierarchy? How change?
What if operand has missing value?
A

Assign value to variable via:
variable = expression;
- variable can be old or new
- expression is a set of instructions, a series of operands/operators that create a value
- NO keyword
- operands: char. const, num. const, date const, char. variable, num. variable
- operators: arithmetic calculations or SAS functions
- operators follow usual arithmetic hierarchy, can use parens to change order
- if any operand in expression has missing value, result will be missing value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does a SET statement do? (2)

How exclude/include variables?

A

SET statement:
READS all variables and WRITES them to output data set

Exclude or include variables using DROP, KEEP (=keywords)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does DROP do?
Syntax?
How many variables per DROP statement?
How separate variables?

A

DROP specifies variable to EXCLUDE from output data set:
DROP variable1 variable2 ….;
(variable names separated by spaces)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does KEEP do?
Syntax?
How separate variables?
What must be included?

A

KEEP specifies variable to INCLUDE in output data set
KEEP variable1 variable2 ….;
(variable names separated by spaces)

must include every variable to be written - new variables too!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How decide whether to use DROP or KEEP?

Any effect on input data set?

A

Doesn’t really matter whether use DROP or KEEP.
Use the one that means you specify the fewest var.

DROP, KEEP have no effect on input data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does SAS process DATA step? Phases?

A

SAS processes DATA step in two phases:

Compilation phase and execution phase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What happens during Compilation? (4 steps)

What does descriptor portion include?

A

Compilation:

  1. SAS SCANS each data step for syntax errors
  2. COMPILES program – converts to machine code if no errors found
  3. CREATES PDV (program data vector) to hold current obs
  4. When compilation complete, RECORDS (makes) descriptor portion of new data set

Descriptor portion includes data set name, variable names

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
What is a PDV?
Where is it?
What is automatically included? (2)
       What are these used for? Default value?
How much space is used?
What is PDV used for?
A

PDV = program data vector
- area of memory where SAS builds one observation
- contains 2 automatic variables that can be used as part of processing but that are not written to data set:
N interation # of data step
Error signals appearance of error caused by data during execution. Default = 0 = no error
- a slot is added for each variable in input data step

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the automatic variables in a PDV?
What are they used for?
Default values?

A

PDV has 2 automatic variables that can be used as part of processing but that are not written to data set:
N interation # of data step
Error signals appearance of error caused by data during execution. Default = 0 = no error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What supplies variable name, type, length to PDV?

A

Descriptor portion of data set supplies attributes

var name, type, length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How create new variables in PDV? (5 step process)
What if var being dropped?
Where does info about var attributes come from?

A
  1. ADD SLOT for each variable in input data step
  2. GET ATTRIBUTES from descriptor portion of data set:
    var name, type, length
  3. PUT NEW variable in PDV
  4. [In compilation phase, SAS FLAGS any var to be dropped from output]
  5. BOTTOM of data set: compilation phase is complete, and descriptor portion of new data set is RECORDED
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
Processing data step: Execution Phase
What happens? Default?
Step by step:
PDV initialized to?
When SET step executes?
When get new var?
What happens at bottom of data step?
What about dropped values?
Then what?
What is retained? what happens to them?
What about value of new var?
How long continue?
A

Reads and processes observations, creates data portion
By default, executes 1x for each obs in input data set
1. Execution phase start =>
PDV INITIALIZED to NULL (“missing”) values
2. SET step executes =>
READ first obs INTO PDV
(Bonus - new variable - still missing)
3. ASSIGNMENT statement executes =>
ASSIGNS value to bonus
4. Bottom of data step =>
values in PDV WRITTEN TO NEW data set
(doesn’t write dropped variables)
5. GOES BACK for next iteration
= implicit output
implicit return
- Retains values from input data set in PDV
- Values will be overwritten when next obs read
- Reinitializes value of new variable to “missing”
Continues to EOF

17
Q

What is IF statement used for?

Syntax?
What is expression?
Use special WHERE operators?

A

IF statement is used to subset obs based on values you create
Syntax:
IF expression;
- expression = set of instructions (like WHERE statement)
- cannot use special WHERE operators in IF
(ex: between–and, is-missing, contains, is null, like)

18
Q

flow chart of IF statement

A

attempt at flow chart:
data statement
read an obs
IF expression => false? go back to data statement
=> true? continue processing obs
output obs to data set

19
Q

When use WHERE?
When use IF?

PROC step?
DATA step?

What must WHERE reference? What does this mean?

A

PROC step => MUST use WHERE
DATA step => can ALWAYS use IF
can SOMETIMES use WHERE
- WHERE must reference var in input data set
- if variable is not in all data sets, can’t use WHERE
- if variable created in assignment step, can’t use it in WHERE - doesn’t exist

20
Q

What does LABEL statement do in DATA step?
Where are they stored?
what about labels, formats in new PROC PRINT?
How get PROC PRINT to display labels?

A
  • when use LABEL statement in DATA step, labels are permanently associated with variables
  • stored in descriptor portion
  • labels and formats specified in a new PROC PRINT override permanent labels (though perm labels not changed)
  • must include label option to PROC PRINT to display permanent labels
21
Q

What does PROC CONTENTS do?

A

displays contents of descriptor portion of data set

22
Q

what does FORMAT do in DATA step?

what is formatted?

A
  • FORMAT permanently associates format with variable when used in DATA step
  • format variable, not label name