Access and Create Data Structures Flashcards
Data types in SAS
Two data types; Numeric and Character
A dot (.) represents
Missing Data value
SAS is case-sensitive. True or False?
False. SAS is case-insensitive.
DATA step
DATA steps are used to Create datasets, data, Read and Modify the data.
PROC step
PROC step is used for processing data created by DATA step. Data is processed by analyzing data, performing utility functions, printing reports.
Statements that indicate the end of step
run; quit; stop; abort; or Encountering a new statement.
SASHELP
It is a library that contains information that controls your SAS session along with sample SAS data sets.
SASUSER
It is a library that stores any information regarding changes to the default settings for the SAS windowing environment. You can also store SAS data sets, SAS programs, and other SAS files in the SASUSER library
WORK library
It is the temporary storage location for datasets. It is the default library that SAS places datasets created without a specified library. These files are temporary and are deleted at the end of the session.
DATALINES
It is a statement used in DATA step to enter rows of raw data. The data is terminated by a semi-colon. The DATALINES statement should be the last statement in the DATA step. CARDS statement can also be used instead of DATALINES
INFILE
It is used to tell SAS the filename and location. The INFILE statement must follow the DATA statement and precede the INPUT statement.
INFILE ‘C:\MyDir\Desktop\Home\users.dat’;
LRECL
LRECL expanded as Longest Record Length, is as the name suggests the longest record length in a data file. It used to specify the length of a record. A record is one row(also referred to as observation) in the data file.
INFILE ‘location\user.txt’ LRECL = 2000;
The Dollar sign ($) besides a variable indicates…
that the variable is of character data type.
INPUT streetName $ streetNumber;
TITLE
The TITLE statement tells SAS to put the text enclosed in quotation marks on the top of each page of output. Without this statement, SAS would put the words “The SAS System” at the top of each page.
LENGTH
It is used to define the length of the values of a variable. The default length of a variable is 8. Using LENGTH, you are able to define any length between 1 and 32,767.
Informats
Informats are used to tell SAS how to interpret the data. They are useful when dealing with non-standard data. There are 3 types of Informats; Character, numeric, and date.
Formatted inputs
Using Informats such as character($informatw.), numeric(informatw.d), and date(informatw.) where w is the width and d is the number of decimal places.
INPUT Name $16. +1 Age 3. ;
What does ‘+1’ mean here?
‘+1’ is used to skip over a column
Different between @ and @n
@n is a column pointer
Difference between @ and @@ in INPUT statement
@ and @@ are both line-hold specifiers.
@ is used when you want to impose a condition onto the raw data line. When SAS reaches the end of the observation, based on the IF condition, SAS decides whether or not to keep the observation.
@@ is used when there is more than one observation in a line. @@ tells SAS that the observation has ended and the next observation starts.
INPUT statement options
FIRSTOBS = n; tells SAS at what line to begin reading data.
OBS = n; tells SAS to read n number of raw data lines.
MISSOVER; SAS automatically reads a new line of raw data if there are still variables in the observation that don’t have an assigned value. Using MISSOVER, SAS assigns missing values for these types of variables.
TRUNCOVER;
DELIMITER= or DLM=
It is an option INFILE statement that helps SAS read data from files with other delimiters. The default delimiter when SAS reads data is space. The delimiter is specified as the option value in single quotes. If delimiter is a string of characters, DLMSTR= is used.
ASCII value for tab delimiter. DLM=?
DLM = ‘90’X is used for tab delimiters
DSD option
It is expanded as Delimiter-Sensitive Data.
DSD assumes delimiter is comma if DLM= option is not assigned.
DSD ignores delimiters embedded in a data value if it is enclosed with quotation marks. Quotation marks are not read as a part of the data value.
By default, SAS interprets two delimiters as a single delimiter. Hence when reading data with missing values, the DSD option is used in INFILE.
It is common to use DSD along with MISSOVER in cases where SAS might read new observation data if the last value is missing.