Midterm 1 Flashcards
The Data Step
The data step manipulates the data.
The input for a data step can be of several types, such as raw data or a SAS data set.
The output from a DATA step can be of several types, such as a SAS data set or a report.
Do all SAS programs contain a DATA step?
No
PROC step
In general, the PROC step analyzes data, produces output, or manages SAS files.
The input for a PROC step is usually a SAS data set.
THe ouput from a PROC step can be of several types, such as a report or an updated SAS data step.
SAS Statements
A SAS statement is a series of items that might include keywords, SAS names, special characters, and operators.
The two types of SAS statements are:
Those that are used in DATA and PROC steps.
Those that are global in scope and can be used anywhere in a SAS program
All SAS statements end with a semicolon.
Global statements
Used anywhere in a SAS program.
Stay in effect until changed or canceled, or until you end your SAS session.
i.e. TITLE, OPTIONS, FOOTNOTE
SAS data sets
SAS file store in a SAS library that SAS creates and processes.
Contains data values that are organized as a table of observations (rows) and variables (columns)
Contains descriptor information such as the data types and lengths of the variables.
SAS libraries
A collection of one or more SAS files, including SAS data sets, that are referenced and stored as a unit.
A logical name (libref) can be assigned to a SAS library using the LIBNAME statement
Libref
A libref can be up to 8 characters long.
must begin with a letter or an underscore.
can contain only letters, digits, or underscores.
i.e. libname project ‘C:\workshop\winsas\lwcrb’;
Which of the following sentences is true concerning the LIBNAME statement:
A. The LIBNAMEstatement must go in a DATAstep.
B. The LIBNAMEstatement must end in a semicolon.
C. The LIBNAME statement must be the first statement in a program.
D. The LIBNAME statement must be followed by the RUN statement.
B. The LIBNAMEstatement must end in a semicolon.
Two-level SAS data set name
A SAS data set can be referenced using a two level SAS data set name: libref.dataset
i.e. proc sort data=work.enroll
libref is the logical name that is associated with the physical location of the SAS library.
data set is the data set name, which can be up to 32 characters long, must begin with a letter or an underscore, and can contain letters, digits, and underscores.
One-level SAS data set name
A data set referenced with a one level name is automatically assigned to the work library by default.
i.e. proc sort data=enroll out=project.enroll;
Temporary SAS Data sets
A temporary SAS dat set is one that exists only for the current SAS session or job.
The work library is a temporary data library.
Data sets held in the Work library are deleted at the end of the SAS session.
Permanent SAS data sets
A data set that resides on the external storage medium of your computer and is not deleted when the SAS session terminates.
Any data library referenced with a LIBNAME statement is considered a permanent data library by default.
Variables
Data values are organized into columns called variables.
Variables have attributes, such as the name and type, that enable you to identify them and that define how they can be used.
Variable names
Variable names can be up to 32 characters long
Must begin with a letter or an underscore.
Can contain only letters, digits, or underscores.
Which of the following variable names is valid? A. street# B. zip_code C. 2address D. last name
B. zip_code
Character variables
Character variables are stored with a length of 1 to 32,767 bytes with 1 character equaling 1 byte.
Character variables can contain letters, numeric digits, and other special characters.
Numeric variables
Numeric variables are stored as floating-point numbers with a default byte size of 8.
To be stored as a floating point number, the numeric value can contain numeric digits, plus or minus sign, decimal point, and E for scientific notation.
How many of the following data sets aren't permanent data sets? work.enroll temp.enroll project.enroll enroll
Two (work.enroll and enroll)
How should a date be stored in SAS?
a. character
b. numeric
b. numeric
SAS Dates
A SAS date value is a value that represents the number of days between January 1, 1960, and a specified date.
Dates before January 1, 1960 are negative numbers.
Dates after January 1, 1960, are positive numbers.
To reference a SAS date value in a program, use a SAS date constant.
A SAS date constant is a date (DDMMMYYYY) in quotation marks followed by the letter D.
ex. ‘12NOV1986’d
What is the numeric SAS date value for December 25, 1959? A. -6 B. -7 C. 6 D. 8
B. -7
Missing Data
Missing data is a vlaue that indicates that no data value is stored for the variable in the current observation.
A missing numeric value is displayed as a single period (.)
A missing character value is displayed as a blank space.
CONTENTS procedure
The contents procedure shows the descriptor portion of a SAS data set.
i.e. proc contents data=project.enroll; run;
the VARNUM option can be used to print the variable list in the order of the variables’ potions in the data set.
Which step displays the director of the project library and suppresses printing the contents of individual data sets?
A. proc contents data=project; run;
B. proc contents data=project.all;
C. proc contents data=project nocontents; run;
D. proc contents data=project._all_nods; run;
D. proc contents data=project._all_nods; run;
PRINT Procedure
The print procedure can show the data portion of a SAS data set.
ex. proc print data=project.enroll; run;
Comments
Two ways to add comments:
comment
/ comment */
What is the name of the data set being read?
data work.newprice;
set golf.supplies;
golf.supplies
What is the name of the data set being created?
data work.newprice;
set golf.supplies;
work.newprice
Set statement
The SET statement reads an observation from one or more SAS data sets for further processing in the DATA step.
By default, the SET statement reads all variables and all observations from the input data sets.
The set statement can read temporary or permanent data sets.
Compilation phase
During the compilation phase, SAS does the following:
Checks the syntax of the SAS statements.
Translates the statements into machine code.
Identifies the name, type, and length of each variable.
The following three items are potentially created:
input buffer
program data vector
descriptor information
Input Buffer
The input buffer is a logical area in memory into which SAS reads each record of a raw data file when SAS executes an INPUT statement.
This buffer is created on when the DATA step reads raw data
When the data step reads a SAS data set, SAS reads the data directly into the program data vector.
Program Data Vector (PDV)
A logical area in memory where SAS builds a data set, one observation at a time.
Along with data set variables and computed variables, the PDV contains the following two automatic variables:
- the _N- variable, which counts the number of times the DATA step begins to iterage.
- the ERROR variable, which signalas the occurrence of an error caused by the data during execution. Either 0 (no error) or 1 (one or more errors occured)
Which of the following statements is false concerning the N and ERROR variables?
A. SAS does not write the N and ERROR variables to the output data set.
B. SAS increments the N variable by 1 for each iteration of the DATA step.
C. SAS automatically generates the N and ERROR variables for every DATA step.
D. SAS sets the ERROR variable equal to the total number of errors caused by the data during execution
D. SAS sets the ERROR variable equal to the total number of errors caused by the data during execution
Which one of the following is not one of the items in the PDV at compile time?
A. byte size of the variable
B. Initial value of the variable.
C. Name of the variable
D. type (character or numeric of the variable
B. Initial value of the variable.
Descriptor information
Information that SAS creates and maintains about each SAS data set, including data set attributes and variable attributes.
I.e. name of the data set, date and time that the data set was created, names data types, and lengths of the variables.
Execution Phase
During the execution phase, SAS does the following:
- Initializes the PDV to missing and sets the initial values of N and ERROR
- Reads data values into the PDV
- Executes any subsequent programming statements
- Outputs the observation to a SAS data set
- Returns to the top of the DATA step
- Resets the PDV to missing for any variables not read directly from a data set and increments N by 1
- repeats the process until the end of file is detected.
How many times does SAS iterate through a DATA step with 9 observations?
Nine times
DROP statement
the DROP statement specifies the names of the variables to omit from the output data set.
Use DROP= after data-set input name to specify the variables for writing to a specific output data set.
data work.total(keep=name total test1 test2)
KEEP statement
The KEEP statement specifies the names of the variable to write to the output data set.
Use KEEP= after data-set input name to specify the variables for writing to a specific output data set.
data work.total(drop=name total test1 test2)
FORMAT Statements
The FORMAT statement associates formats to variable values. ex. data work.newprice; set golf.supples; saleprice=price*0.75; format saleprice dollar18.2; run; Format statements assigned in a DATA step are considered permanent attributes (stored in the descriptor portion).