Fundamentals Flashcards
What are the three functions of a data statement in a DATA step?
- Signals the beginning of a data step
- Defines where to store SAS dataset
- Names the dataset
What is the function of the Infile statement in a DATA step?
The infile statement declares the text file to import.
What is the function of the input statement in a DATA step?
Input statement defines the name, type (character or numeric) and length of each column being generated from the raw data.
What is the function of a run statement in a DATA step?
The run statement completes the DATA step and executes the code once it’s submitted.
Give an example of a list input sequence in a DATA step.
Data sasdatasetname;
infile ‘textfilename’;
input column1 column2 column3…etc;
run;
Explain List input in a DATA
- A method of reading data values from an input data file where values are delimited by spaces, tabs, commas or other specified character.
- Variables are specified in the INPUT statement and read in the specified order.
-With List Input, the INPUT statement scans along the raw data record line, when a space is found, it assumes the end of the field has been reached.
Name some limitations of list input
- Blanks must separate the fields
- The values for character columns are restricted in that: Default length is eight characters
- No embedded blanks are allowed with the default. For example, data containing ‘Dave Derry’ as a name will take ‘Dave’ as 1 column and ‘Derry’ as another.
- Mismatched columns if there’s missing data
Briefly describe column input
Requires the column location of the variable values to be known and specified in the input statement.
Briefly describe column input
Requires the column location of the variable values to be known and specified in the input statement.
Describe formatted input
- This style of INPUT statement allows values to be read using an informat (a template used by SAS to read values)
- Requires that the column location at which to start
reading the value and the name of the informat to use are specified
Write the syntax for formatted input
@Value_Start_Position var_name informat_name
What is the purpose of using a DATA step cut and paste method?
This method has two purposes;
1. Cut and paste data from another program
2. Type the date into the SAS program
How is the DATA step cut and paste method used?
- Uses datalines/cards statement instead of infile statement (in a different place)
-The data is pasted or typed into the editor making it unsuitable for large amounts of data - This technique can be used with any list, column or formatted input methods.
What is delimiter-sensitive data (DSD)?
DSD are files that do not come separated by blank spaces, instead, they use another character in between values.
Arrange the
following SAS keywords in the correct order for reading an external file.
INPUT
RUN
INFILE
DATA
Match the SAS keywords with the correct definition:
a) Specifies the name of the text file that the program is to read;
b) Completes the DATA step processing;
c) Starts the DATA step processing and names the output table;
d) Defines the variable names and types.
- DATA: Starts the DATA step processing and names the output table
- INFILE: Specifies the name of the text file that the program needs to read
- INPUT: Defines the variable names and type
- RUN: Completes the DATA step processing
Briefly explain the Import procedure
- Proc import converts external data such as space, tab,comma delimited files, and database files (e.g excel spreadsheets) into SAS data sets
- Provides a simple syntax whilst writing and running the DATA step code in the background
What is the basic syntax for a proc import procedure?
Proc import datafile=”filename/fileref
out=sas-table-name
<DBMS=Identifier>
<Replace>;
run;
</Replace>
What is the basic syntax for a proc export procedure?
Proc import data=sas-table-name
outfile= “filename”/fileref
<DBMS=Identifier>
<Replace>;
run;
</Replace>
How many variables does a SAS program have?
2.
a. Character (Must be enclosed in quotation marks)
b. Numeric
What are numeric expressions?
- Mathematical expressions can be constructed in the SAS language various signs such e.g A=c+d, - * / etc
- Expressions within parentheses are evaluated prior to expressions outside of parentheses.
What are numeric expressions?
- Mathematical expressions can be constructed in the SAS language various signs such e.g A=c+d, - * / etc
- Expressions
within parentheses are evaluated prior to expressions outside of parentheses.
describe The LENGTH statement.
- The LENGTH statement allows the programmer to control how the new variable will be
created - As a general rule, LENGTH statements should always be placed at the beginning of the DATA step
In which case would you use a Set statement over infile and input statements?
The DATA step is used to manipulate data. The source data can either be external ‘Non SAS’, or an existing SAS table
-If the source file used is external, Infile and Input statements are used
- If the source data is an existing SAS table then a SET statement is used instead:
What is a Set statement?
- Through implied DATA step looping, a SET statement reads all observations in a SAS data set unless options are used to dictate otherwise.
- By default, all variables are read and their properties are as defined in the source data set.
- The SET statement reads a observation or row from a SAS table each time it is executed
What is Conditional processing? Give 3 conditional statements.
Conditional processing allows the programmer to control statements based on values found in the data.
- IF expression; Subsets the number of observations on the output table.
- IF expression THEN action; Performs programming
statements when the condition in the expression is met. - IF expression THEN action;
ELSE action; Performs programming statements when the condition in the expression is met and also
when it is not met
What is a “Subsetting IF” statement?
The concept is that, if the observation meets the condition of the ‘IF’ statement, it is
allowed into the output data:
What is an “IF-Then” statement?
‘IF – THEN’ statements are used to evaluate a condition and execute a SAS statement if the condition is true
What is an “IF-Then” and “Else” statement?
IF – THEN and ELSE statements are used to evaluate a condition and execute one statement if the condition is true, but execute a different statement if the condition is false.
What is an “IF-The-DO” statement?
Use IF – THEN – DO to conditionally execute multiple statements:
*Must start with Do block and and finish with end;
What does the output statement in a DATA step do?
- The OUTPUT statement is used to write an observation to a specific table
- Once a programmer makes use of the OUTPUT statement within a DATA step,the implicit OUTPUT associated with the RUN is disabled.
- Therefore if the OUTPUT
statement is used, then whenever it is necessary to output data to a data set, this must be explicitly done with further OUTPUT statements
Explain the Compilation phase
- Initial step where SAS scans the program code to identify any syntax errors or issues.
- Determines the input and output files to be used in the program.
- Sets up the LPDV which acts as a transient area of memory for SAS holding the variables and values as SAS reads and processes the data
Explain the Execution phase
- In this phase, data is read into the LPDV one observation at a time. Before each read, missing values are initialised and the iteration variable N increases by 1.
- Additional programming statements are executed to manipulate the data and create new variables.
- The LPDV is then written to the output file.
- The execution phase loops back to the beginning and repeats until all observations have been processed.
- The execution phase terminates when the last observation has been processed
What is the LPDV?
- Logical Program Data Vector (LPDV) is a temporary storage area used during data processing.
- It holds the variables and their corresponding values as SAS reads and manipulates data.
How are dataset options applied?
- Data set options in SAS are used to modify the behavior and characteristics of the resulting SAS dataset, such as storage format and sorting requirements.
- They are applied after the name of a SAS data set and they must be specified in brackets:
SAS-data-set (option-name=……)
What option is used in a PROC Sort step to remove duplicate values of the BY variable(s) ?
- NODUPKEY Removes rows with duplicate BY variable values. The check for duplicates is only made against the columns that are listed on the BY statement.
*DUPOUT= When sorting a data set using the NODUPKEY option, the DUPOUT= option can be specified together with a data set name, that will be used to store the duplicate observations
Write code to create a SAS format that could be used to display peoples ages (held as numeric integers variables) as either ‘Young’ or ‘Old’ (‘Young’ people are those under 30 and everyone else is ‘Old’)
OPTIONS FMTSEARCH = (AMADEUS);
Data Datasetname;
set datasetname;
proc format;
value agefmt
low -<30=”young
30-high=”old”
run;
format age agefmt;
Explain the following concepts :
a) SAS Format
b) SAS Informat
- SAS format: A mechanism that can be used to write out data values, in a different form from the way in which they are actually stored e.g. date format or char formats. acts as a mask to make a variable look different without changing it’s format
- SAS informat : A rule or instruction that tells SAS how to read/interpret data values
After this statement is run, the NEWDATE variable is held as a SAS date. What date will it represent ?
newdate = INTNX( ‘month’ , ’10JAN2016’d , 1 ) ;
- Newdate is 1 February 2016.
If you wanted the output to be exactly 10 January 2016 you’d need to add “S” after 1.
newdate = INTNX(‘month’, ‘10JAN2016’d, 1, ‘S’);
True/False: Data sets referenced in a SET statements are input to the step
True
Data sets referenced in a SET statement are used as input to the data step in SAS
True/False: Data sets referenced in a DATA step are output to the step
False.
- Data sets referenced in a DATA step are not output to the step by default.
- In a DATA step, you can manipulate and transform data, but to output the modified data to a new data set, you need to use an explicit OUTPUT statement or create a new data set using a subsequent DATA statement with the desired output data set name.
When should you use a KEEP option?
Use the KEEP option after the name of a SAS data set in order to specify the variables that we need to load into memory or include in the output data set
When should you use a DROP option?
Use the DROP option after the name of a SAS data set in order to prevent some variables from being loaded into memory and included in the output data set
When should you use a RENAME option?
Use the RENAME option after the name of a SAS data set in order to change a variable name
* RENAME option can be used to rename multiple variables with a hyphen between the first and last variable name. The variables must have a numeric suffix.
When should you use a RENAME option?
Use this option to limit the number of observations that need to be read for processing, or to limit the number of observations that are to be written to an output table.
*The ‘where’ expression to evaluate must be enclosed in brackets