Fundamentals Flashcards

1
Q

What are the three functions of a data statement in a DATA step?

A
  1. Signals the beginning of a data step
  2. Defines where to store SAS dataset
  3. Names the dataset
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the function of the Infile statement in a DATA step?

A

The infile statement declares the text file to import.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the function of the input statement in a DATA step?

A

Input statement defines the name, type (character or numeric) and length of each column being generated from the raw data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the function of a run statement in a DATA step?

A

The run statement completes the DATA step and executes the code once it’s submitted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Give an example of a list input sequence in a DATA step.

A

Data sasdatasetname;
infile ‘textfilename’;
input column1 column2 column3…etc;
run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain List input in a DATA

A
  • A method of reading data values from an input data file where values are delimited by spaces, tabs, commas or other specified character.
  • Variables are specified in the INPUT statement and read in the specified order.
    -With List Input, the INPUT statement scans along the raw data record line, when a space is found, it assumes the end of the field has been reached.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Name some limitations of list input

A
  • Blanks must separate the fields
  • The values for character columns are restricted in that: Default length is eight characters
  • No embedded blanks are allowed with the default. For example, data containing ‘Dave Derry’ as a name will take ‘Dave’ as 1 column and ‘Derry’ as another.
  • Mismatched columns if there’s missing data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Briefly describe column input

A

Requires the column location of the variable values to be known and specified in the input statement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Briefly describe column input

A

Requires the column location of the variable values to be known and specified in the input statement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe formatted input

A
  • This style of INPUT statement allows values to be read using an informat (a template used by SAS to read values)
  • Requires that the column location at which to start
    reading the value and the name of the informat to use are specified
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Write the syntax for formatted input

A

@Value_Start_Position var_name informat_name

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the purpose of using a DATA step cut and paste method?

A

This method has two purposes;
1. Cut and paste data from another program
2. Type the date into the SAS program

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is the DATA step cut and paste method used?

A
  • Uses datalines/cards statement instead of infile statement (in a different place)
    -The data is pasted or typed into the editor making it unsuitable for large amounts of data
  • This technique can be used with any list, column or formatted input methods.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is delimiter-sensitive data (DSD)?

A

DSD are files that do not come separated by blank spaces, instead, they use another character in between values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Arrange the
following SAS keywords in the correct order for reading an external file.

INPUT
RUN
INFILE
DATA

Match the SAS keywords with the correct definition:

a) Specifies the name of the text file that the program is to read;
b) Completes the DATA step processing;
c) Starts the DATA step processing and names the output table;
d) Defines the variable names and types.

A
  1. DATA: Starts the DATA step processing and names the output table
  2. INFILE: Specifies the name of the text file that the program needs to read
  3. INPUT: Defines the variable names and type
  4. RUN: Completes the DATA step processing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Briefly explain the Import procedure

A
  • Proc import converts external data such as space, tab,comma delimited files, and database files (e.g excel spreadsheets) into SAS data sets
  • Provides a simple syntax whilst writing and running the DATA step code in the background
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the basic syntax for a proc import procedure?

A

Proc import datafile=”filename/fileref
out=sas-table-name
<DBMS=Identifier>

<Replace>;
run;
</Replace>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the basic syntax for a proc export procedure?

A

Proc import data=sas-table-name
outfile= “filename”/fileref
<DBMS=Identifier>

<Replace>;
run;
</Replace>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How many variables does a SAS program have?

A

2.
a. Character (Must be enclosed in quotation marks)
b. Numeric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are numeric expressions?

A
  • Mathematical expressions can be constructed in the SAS language various signs such e.g A=c+d, - * / etc
  • Expressions within parentheses are evaluated prior to expressions outside of parentheses.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are numeric expressions?

A
  • Mathematical expressions can be constructed in the SAS language various signs such e.g A=c+d, - * / etc
  • Expressions
    within parentheses are evaluated prior to expressions outside of parentheses.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

describe The LENGTH statement.

A
  • The LENGTH statement allows the programmer to control how the new variable will be
    created
  • As a general rule, LENGTH statements should always be placed at the beginning of the DATA step
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

In which case would you use a Set statement over infile and input statements?

A

The DATA step is used to manipulate data. The source data can either be external ‘Non SAS’, or an existing SAS table
-If the source file used is external, Infile and Input statements are used
- If the source data is an existing SAS table then a SET statement is used instead:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a Set statement?

A
  • Through implied DATA step looping, a SET statement reads all observations in a SAS data set unless options are used to dictate otherwise.
  • By default, all variables are read and their properties are as defined in the source data set.
  • The SET statement reads a observation or row from a SAS table each time it is executed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is Conditional processing? Give 3 conditional statements.

A

Conditional processing allows the programmer to control statements based on values found in the data.

  1. IF expression; Subsets the number of observations on the output table.
  2. IF expression THEN action; Performs programming
    statements when the condition in the expression is met.
  3. IF expression THEN action;
    ELSE action; Performs programming statements when the condition in the expression is met and also
    when it is not met
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is a “Subsetting IF” statement?

A

The concept is that, if the observation meets the condition of the ‘IF’ statement, it is
allowed into the output data:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is an “IF-Then” statement?

A

‘IF – THEN’ statements are used to evaluate a condition and execute a SAS statement if the condition is true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is an “IF-Then” and “Else” statement?

A

IF – THEN and ELSE statements are used to evaluate a condition and execute one statement if the condition is true, but execute a different statement if the condition is false.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is an “IF-The-DO” statement?

A

Use IF – THEN – DO to conditionally execute multiple statements:
*Must start with Do block and and finish with end;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does the output statement in a DATA step do?

A
  • The OUTPUT statement is used to write an observation to a specific table
  • Once a programmer makes use of the OUTPUT statement within a DATA step,the implicit OUTPUT associated with the RUN is disabled.
  • Therefore if the OUTPUT
    statement is used, then whenever it is necessary to output data to a data set, this must be explicitly done with further OUTPUT statements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Explain the Compilation phase

A
  1. Initial step where SAS scans the program code to identify any syntax errors or issues.
  2. Determines the input and output files to be used in the program.
  3. Sets up the LPDV which acts as a transient area of memory for SAS holding the variables and values as SAS reads and processes the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Explain the Execution phase

A
  1. In this phase, data is read into the LPDV one observation at a time. Before each read, missing values are initialised and the iteration variable N increases by 1.
  2. Additional programming statements are executed to manipulate the data and create new variables.
  3. The LPDV is then written to the output file.
  4. The execution phase loops back to the beginning and repeats until all observations have been processed.
  5. The execution phase terminates when the last observation has been processed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is the LPDV?

A
  • Logical Program Data Vector (LPDV) is a temporary storage area used during data processing.
  • It holds the variables and their corresponding values as SAS reads and manipulates data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How are dataset options applied?

A
  • Data set options in SAS are used to modify the behavior and characteristics of the resulting SAS dataset, such as storage format and sorting requirements.
  • They are applied after the name of a SAS data set and they must be specified in brackets:

SAS-data-set (option-name=……)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What option is used in a PROC Sort step to remove duplicate values of the BY variable(s) ?

A
  • NODUPKEY Removes rows with duplicate BY variable values. The check for duplicates is only made against the columns that are listed on the BY statement.
    *DUPOUT= When sorting a data set using the NODUPKEY option, the DUPOUT= option can be specified together with a data set name, that will be used to store the duplicate observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Write code to create a SAS format that could be used to display peoples ages (held as numeric integers variables) as either ‘Young’ or ‘Old’ (‘Young’ people are those under 30 and everyone else is ‘Old’)

A

OPTIONS FMTSEARCH = (AMADEUS);
Data Datasetname;
set datasetname;
proc format;
value agefmt
low -<30=”young
30-high=”old”
run;

format age agefmt;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Explain the following concepts :

a) SAS Format
b) SAS Informat

A
  1. SAS format: A mechanism that can be used to write out data values, in a different form from the way in which they are actually stored e.g. date format or char formats. acts as a mask to make a variable look different without changing it’s format
  2. SAS informat : A rule or instruction that tells SAS how to read/interpret data values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

After this statement is run, the NEWDATE variable is held as a SAS date. What date will it represent ?

newdate = INTNX( ‘month’ , ’10JAN2016’d , 1 ) ;

A
  • Newdate is 1 February 2016.

If you wanted the output to be exactly 10 January 2016 you’d need to add “S” after 1.

newdate = INTNX(‘month’, ‘10JAN2016’d, 1, ‘S’);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

True/False: Data sets referenced in a SET statements are input to the step

A

True

Data sets referenced in a SET statement are used as input to the data step in SAS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

True/False: Data sets referenced in a DATA step are output to the step

A

False.

  • Data sets referenced in a DATA step are not output to the step by default.
  • In a DATA step, you can manipulate and transform data, but to output the modified data to a new data set, you need to use an explicit OUTPUT statement or create a new data set using a subsequent DATA statement with the desired output data set name.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

When should you use a KEEP option?

A

Use the KEEP option after the name of a SAS data set in order to specify the variables that we need to load into memory or include in the output data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

When should you use a DROP option?

A

Use the DROP option after the name of a SAS data set in order to prevent some variables from being loaded into memory and included in the output data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

When should you use a RENAME option?

A

Use the RENAME option after the name of a SAS data set in order to change a variable name
* RENAME option can be used to rename multiple variables with a hyphen between the first and last variable name. The variables must have a numeric suffix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

When should you use a RENAME option?

A

Use this option to limit the number of observations that need to be read for processing, or to limit the number of observations that are to be written to an output table.
*The ‘where’ expression to evaluate must be enclosed in brackets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is a KEEP statement?

A

The KEEP statement specifies a list of variables to store in all of the output tables listed in the DATA statement
* Does not limit the variables that will be loaded on the LPDV;
* Does not require an equal (=) symbol. Simply list the names of the variables to keep.

47
Q

What is a DROP statement?

A

The DROP statement specifies a list of variables to ignore when writing to the output tables.
* Does not limit the variables that will be loaded on the LPDV;
* Does not require an equal (=) symbol. Simply list the names of the variables to keep.

48
Q

What is a RENAME statement?

A

The RENAME statement allows changing the variable name on the output data sets.
* If variables are used in the program they will still have their original names;
* Does not require an equal (=) symbol after RENAME or the extra set of brackets
needed by the option. Simply specify: variable name = new name.

49
Q

What is a WHERE statement?

A

The WHERE statement behaves differently other statements.
*The WHERE statement applies to the input tables and not the output data sets created.
*Can only be used when reading data from a SAS data set;

50
Q

Briefly define functions.

A

A SAS function is a routine that returns a value resulting from zero or more arguments.

51
Q

When would you use a ROUND() function?

A

This function is useful for rounding numeric values to a nearest round-off unit.

52
Q

Match the following functions with their functions;

  1. CEIL
  2. FLOOR
  3. INT Functions

a. Returns the largest integer, less than or equal to the argument
b. Returns the integer portion of the argument
c. Returns the smallest integer, greater than or equal to the argument

A
  • CEIL : returns the smallest integer, greater than or equal to the argument
  • FLOOR: returns the largest integer, less than or equal to the argument
  • INT: returns the integer portion of the argument

E.g: Ceil=ceil(var_name);

53
Q

Name 3 statistical functions?.

A

Mean, Max and STD

54
Q

When would you use the SUM() Function?

A

*The SUM function can be used to total the values across a row.
*The SUM function handles the missing value in its calculation. All
functions behave in this way.

55
Q

Fill in the blanks for the following sentences .

a. ________ operator enables two character strings to be concatenated together.
Used as: exclamation marks (!!), two vertical bars (||) or two broken vertical bars (¦¦)
b. _______ function removes trailing blanks from a character value
c. The _____ and _____ functions give the ability to left or right align a character string

A

a. Concatenation Operator
b. Trim
c. LEFT and RIGHT

56
Q

When would you use the Cat family of functions?

A

An alternative to using the concatenation operator used to concatenate two or more character strings.

*cat, catt and cats have similar syntax :
e.g CATT(argument-1<,…,argument-n>)
*Cats slightly differs in syntax:

e.g. CATX(separator,argument-1
<,…,argument-n>)

57
Q

How would you use SUBSTR function?

A

SUBSTR can extract or replace segments from character strings and has the following
syntax:

*To extract:
newvar = SUBSTR( column , start_position, length );

*To replace:
SUBSTR( column , start_position, length ) = repvalue;

58
Q

What is the SCAN function?

A

The SCAN Function also extracts words from character strings, but is used when the starting position is not known.
* The SCAN Function has the following syntax:
newvar = SCAN( column , number_of_word, ‘optional_delimiters’ );
*The default delimiters are: space . < ( + & ! $ * ) ; ^ - / , % |

59
Q

What is an INDEX function and how can it be used?

A

*The INDEX function searches a character string for the location of a specified string.
*The INDEX function has the following syntax:
INDEX( column , ‘string to search for’ )
*The INDEX function returns the position number at which a specified string first occurs. If
the specified string is not found then a zero is returned. Therefore it is possible to use the INDEX function in a subsetting WHERE statement to return only observations containing
the specified string:

WHERE INDEX (column, ‘string to search for’) > 0;
OR
WHERE index(upcase(name),’STE’);

60
Q

Why might you face issues with the code below?;

data work.accum1;
set amadeus.overtime;
total_overtime=total_overtime +overtime;
run;

proc print data=work.accum1;
run;

How can you improve this code

A
  1. A variable called TOTAL_OVERTIME is created, but because this is a new variable its value will be initialised to missing in the LPDV during compilation phase
  2. Its value will then
    be re-initialised to missing at the beginning of each iteration of the DATA step during the execution phase
  3. The missing value will result in further missing values being generated for the remainder of the observations.

Code can be improved by adding RETAIN and SUM statements i.e

data work.accum1;
set amadeus.overtime;
total_overtime +overtime;
run;

proc print data=work.accum1;
run;
* Sum statements are beneficial to use because:
*They contain an implied retain, i.e. they will automatically retain values for the new
variable in the LDPV from one iteration of the DATA step to the next;
* They initialise the new variable to 0 during the compilation phase;
* They ignore missing values

61
Q

The following data step calculates a 5% pay raise for high grade employees. Amend the
code below using appropriate DATA step options or statements such that:
* Only the STAFFNO, GRADE, DEPT and SALARY columns and rows where the
individuals SALARY is less than 30000 are read in from the DEMOG table:
* The SALARY column is renamed to WAGE. Do this in such a way that no
changes are required to the IF or ELSE statements in the DATA step:
* The GRADE column is not included on the DEPT_EARNERS output table;

data dept_earners;
set amadeus.demog;
if upcase(grade)=”HIGH” then newsald=salary*1.05;
else newsal=salary;
run

A

data dept_earners(drop=grade);
set amadeus.demog (keep=staffno grade dept salary);
where salary<30000;
rename salary=grade;
if upcase(grade)=”HIGH” then newsal=salary*1.05;
else newsal=salary;

run;

62
Q

What are labels?

A
  • Labels are effectively column descriptions and are typically used when creating reports, so
    that a more meaningful label replaces the column name.
63
Q

One or more column labels can be applied using a LABEL statement. The following
Give an example to show the syntax assuming you’re changing the labels for amds.demog dataset :

A

proc print data=Amds.demog label;
var staffno gender height weight age;
label staffno=”employee number”
gender “sex”
height=”height in inches”
weight=”weight in lls”
age=”Age”;
run;

64
Q

Where can labels be assigned?

A

Labels can be applied both in the DATA or PROC step.

  • DATA step: subsequent procedures will automatically use the labels that it finds in the table header.
  • A LABEL statement is
    required in every PROC step that uses the table – i.e. labels assigned through a procedure stay local to that procedure.
  • Most procedures will automatically pick up labels from a data set, however Proc PRINT needs to be told to look for labels. Without the LABEL option, no labels will be used.
64
Q
A
65
Q
A
65
Q

How can you view table properties in SAS Enterprise Guide?

A

Select the library that contains the table of interest and then use a ‘right mouse click’ to activate the
pop up menu. Select the ‘Properties’ option and view the Column properties.

66
Q

Where can Formats and Informats be applied?

A

Formats and Informats can be applied in many parts of the SAS System. They are not restricted to SAS programs.
* They can be assigned in the DATA step, where they are held in the header part of the SAS table. Proc’s will automatically use the formats but In order to use the output table, the Format must always be available.

  • Formats may also be assigned in the PROC step, where they are used for that procedure only. The format does not need to be available in order to use the table but A FORMAT statement must be used in every PROC step.
66
Q

what does PPROC CONTENTS do?

A

The CONTENTS procedure may be used to view a list of columns and their associated formats, in addition to other table attributes

67
Q

Convert a Character column to a numeric column using the INPUT function and a numeric column to a character column using the PUT function.

A
  1. Char -> Num using INPUT function

data work.num_staff;
set amds.demog;
staff_id=input(staffno, 8.);
run;
2. Num ->Char using PUT function
proc sort data=work.num_staff;
by staff_id;
run;

68
Q

What is PROC FORMAT and how is it used?

A

Proc FORMAT is used to build a customised format.

  1. Creating a Character Format:
    * Their names begin with a dollar sign and the values to the left of the equals sign are quoted.
  2. Creating a Numeric Format:
    * NOT prefixed with the dollar sign and the values to the
    left of the equal sign are NOT quoted.
  • format name is followed by a dot (period)*
69
Q

General notes on creating formats.

A
  • Format names conform to SAS naming conventions, but additionally cannot end with a
    number as the last character
  • Formats of character values must start with the dollar
    currency symbol.
  • LOW, HIGH and OTHER are keywords which mean ‘the lowest value’, ‘the highest. value’
    and ‘all the rest’ respectively.
  • Ranges must not overlap:
    i.e 0-<5=’label1’
    5-<9=’label2’
70
Q

How and would you want to store a format permanently?

A
  • Formats can be stored permanently so that they only have to be defined once, by using
    the LIB= option on the Proc FORMAT statement:
  • When any of the formats are referenced, the SAS System must be told to look in this location. This is done using the FMTSEARCH option:

OPTIONS FMTSEARCH = (lib);

71
Q

Write an example PROC FORMAT code for permanently storing one character format and a numeric format

A

proc format lib=Amds;
value $gender “M”=”Male”
“F”=”Female”;

value  pay  low-<10000     = "Under 10k"
                  10000-<20000 = "Between 10k and 20k"
                  20000-high       ="Above 20k"; run;
72
Q

Why might you sort data?

A

Ordering the data stored in a SAS table is a frequent requirement within SAS programs:

  • Often, before combining data from two or more tables, it is necessary to sort the data;
  • When merging data from two or more tables by key columns, pre-sorting the data by the key variables is mandatory;
  • When using a BY statement within any procedure, the assumption is made that the data is already sorted in the required order. If this is not the case, the procedure will fail.
73
Q

Briefly explain PROC SORT with a example code

A

Proc SORT is a simple procedure that can be used for sorting data stored in a SAS table.

proc sort data=amds.carhire;
by carkey;
run;

proc print data=amds.carhire;
run;

74
Q

Can you output sorted results to a different table, if so, how?

A

Proc SORT allows a new table to be created containing the output from the procedure, whilst leaving the original table intact. This is done using the OUT= option as follows:

proc sort data=amds.carhire; out=work.cars_by_carkey;
by carkey;
run;

75
Q

Where can Formats and Informats be applied?

A

Formats and Informats can be applied in many parts of the SAS System. They are not restricted to SAS programs.
* They can be assigned in the DATA step, where they are held in the header part of the SAS table. Proc’s will automatically use the formats but In order to use the output table, the Format must always be available.

  • Formats may also be assigned in the PROC step, where they are used for that procedure only. The format does not need to be available in order to use the table but A FORMAT statement must be used in every PROC step.
75
Q

PROC SORT a table by multiple columns (dept, grade and age) using demo data from amds library.

A

proc sort data=amds.demo;
out=demo_sort;
by dept grade age ;
run;

proc print data=demo_sort;
var dept grade age;
run;

76
Q

Explain FIRST. and LAST. variables

A
  • When a DATA step containing SET and BY statements is compiled, two automatic columns called FIRST. and LAST. followed by the name of the variable in the BY statement are added to the Logical Program Data Vector.
  • The FIRST. and LAST. variables will automatically get dropped from the output table.
  • These variables are very useful, as they flag the beginning and end of each ‘by’ group
77
Q

Give example code using FIRST and LAST in BY group processing

A

proc sort data= amds.demo
out=work.sorted;
by status;
run;

data work.status (keep=status count);
set work.sorted;
by status;
if first.status=1 then count=0;
count+1;
if last.status then output;
run

proc pint data=work.status;
run;

78
Q

What is “Appending” in SAS?

A

The word ‘append’ means to supplement, when used in the context of SAS tables, it describes the process of connecting two or more tables, one on top of the other. The word ‘concatenate’ is also used in the same context.

The SET statement can be used to append two or more SAS tables, by
listing the tables to be appended on the SET statement.

data= work.once_wed;
set work.seperated
work.divorced;
run;

79
Q

What are offset column names?

A

If the column names of the tables being combined differ, then the resulting data is offset. Because of a differing column name, the resulting table is offset.

80
Q

Using an example code, explain PROC APPEND.

A
  • The APPEND procedure is a simple and very efficient way of adding observations from one table to the bottom of another table.
  • It is more efficient than using a SET statement within a DATA step, because it adds observations to the bottom of the ‘base’ table, without first having to read each observation in the base table

Proc append
base
=work.separatted
data =work.divorced
run;

  • The two tables to be combined have the same structure in terms of column names and column attributes.
81
Q

Where can Formats and Informats be applied?

A

Formats and Informats can be applied in many parts of the SAS System. They are not restricted to SAS programs.
* They can be assigned in the DATA step, where they are held in the header part of the SAS table. Proc’s will automatically use the formats but In order to use the output table, the Format must always be available.

  • Formats may also be assigned in the PROC step, where they are used for that procedure only. The format does not need to be available in order to use the table but A FORMAT statement must be used in every PROC step.
81
Q

What does the FORCE option do?

A
  • If it is necessary to maintain the different column names in the combined table, then the restriction can be overridden by using the FORCE option.

Columns that were present in the
DATA table, but were not present in
the original BASE table, are dropped
and missing values result where
there are mismatched columns

82
Q

What is Merging SAS tables?

A
  • Merging SAS tables means putting them together sideways. This means joining information from two or more tables and putting the rows together.
83
Q

What is a Match Merge? with code example

A
  • Merging tables requires that the rows have the same key field(s) and that both tables are sorted by the key field(s).

proc sort
data
=amds.demo out=work.demo;
by staffno;
run;

proc sort
data
=amds.names out=work.names;
by staffno;
run;

*data work.alldata;
merge work.demo work.names
by staffo;
run;

proc print data=work.alldata;
var staffno name grade salary;
run;

84
Q

What is the IN= Data set Option

A
  • Used on the MERGE or set statements to determine which table an observation comes from.
  • The IN= option is placed after the name of the table and assigns a value of 1 or 0 to column whose name must be supplied by the programmer. The column acts as a flag variable and may be referenced at any point within the DATA step.
85
Q

Typically, programs might only include the observation if it includes column values from both tables. How can In=Option be utilised to achieve this?

A
  • The Subsetting IF statements control which observations will be included in the output
    data set.

data work.alldata (keep=staffno name gender);;
merge work.demo(
in=in_demo)
work.names(
in*=in_names);
by staffo;
if in_demog and in_names

run;

86
Q

How are dates and times stored in SAS?

A

Dates and Times are handled in the SAS System by storing them as numbers. This makes comparisons, additions and subtractions easier.

SAS dates, times and datetimes are stored as simple integer numbers against a common reference point
* DATES are stored as the number of days that have elapsed since the first of January 1960.
* TIMES are stored as the number of seconds that have elapsed since midnight.
* DATETIMES are stored as the number of seconds that have elapsed since midnight of the first of January 1960.

87
Q

Explain date and time informats/formats.

A

INFORMATS:
*To overcome the problem of reading date values into the SAS System, a range of date and time informats are available. These can be used to read a variety of date / time notations, so that the value created in the data table is a real SAS date / time. e.g. date9, time5 datetime18

FORMATS
* To display the informatted values in a meaningful form, appropriate Formats need to be applied: e.g. weekdate29., hhmm10.2., datetime22.

88
Q

List Functions for retrieving current date and time values

A
  • DATE() Returns the current date from the system clock
  • TODAY() Returns the current date from the system clock
  • TIME() Returns the current time from the system clock
  • DATETIME() Returns the current date and time from the system clock
89
Q

Where can Formats and Informats be applied?

A

Formats and Informats can be applied in many parts of the SAS System. They are not restricted to SAS programs.
* They can be assigned in the DATA step, where they are held in the header part of the SAS table. Proc’s will automatically use the formats but In order to use the output table, the Format must always be available.

  • Formats may also be assigned in the PROC step, where they are used for that procedure only. The format does not need to be available in order to use the table but A FORMAT statement must be used in every PROC step.
89
Q

What are interval functions?

A

INTCK Function:

INTCK (“interval”, from, to)

  • INTCK determines the number of time intervals between the from and to date, time or datetime. The interval type may be YEAR, QTR, MONTH, WEEK, DAY
  • Intervals are calculated from fixed time points and partial intervals are not counted

INTNX Function:

INTNX (“interval”, from, increment, “alignment”)

  • INTNX determines the date corresponding to the number of intervals from a start
    date, time or datetime value.
  • The alignment argument controls the position of SAS dates within the interval. Values can be BEGINNING or B (default), MIDDLE or M, END or E, SAME or S
90
Q
A
  • The Proc FREQ procedure is used to generate one, two and multi-way tables of counts for both numeric and character columns.
  • Proc FREQ can be used to provide statistical information, allowing
    relationships between columns to be investigated.

The basic syntax of the procedure is as follows:

PROC FREQ data = table_name;
TABLES variables </options>;
BY variables;
RUN:

91
Q

What are Two-Way and Three-Way Frequency Tables

A
  • Two-way tables are specified with an asterisk between the column names;
    proc freq data=Amds.demog;
    tables genderstatus/list;
    tables gender
    status/crosslist;
    run;
  • Outputting three-way tables and higher is usually restricted for use with columns that only have a few levels, otherwise the output extends over many pages and is unmanageable

proc freq data=Amds.demog;
tables gendercarsstatus;
run;

92
Q

Briefly explain PROC MEANS

A
  • Proc MEANS analyses SAS tables and provides a number of descriptive summary statistics on numeric columns. The procedure can also be used to analyse sub-groups within the data and can optionally create one or more output tables.

The basic syntax of the procedure is as follows:

PROC MEANS data = table_name <options>;
VAR variables;
BY variables;
OUTPUT out = output_statistic_list;
RUN;</options>

93
Q

Where can Formats and Informats be applied?

A

Formats and Informats can be applied in many parts of the SAS System. They are not restricted to SAS programs.
* They can be assigned in the DATA step, where they are held in the header part of the SAS table. Proc’s will automatically use the formats but In order to use the output table, the Format must always be available.

  • Formats may also be assigned in the PROC step, where they are used for that procedure only. The format does not need to be available in order to use the table but A FORMAT statement must be used in every PROC step.
93
Q

How do you specify columns with PROC MEANS?

A

Usually, Proc MEANS is used to obtain certain statistics (5 summary stats by default) for a specific list of numeric columns and the number of decimal places is restricted. e.g.;

proc means data=amds.demo n mix max mean maxdec=3;
var age salary overtime;
run;

94
Q

How can PROC MEANS be used for Sub Group Processing?

A

As with other procedures, sub-group processing can be performed with the BY statement.

proc sort data=amds.demo out=work.demo
by gender;
run;

proc means data=amds.demo n mix max mean maxdec=3;
var age salary overtime;
by gender;

95
Q

Explain Proc UNIVARIATE.

A

Proc UNIVARIATE provides detailed information on the distribution of numeric columns.

The basic syntax is as follows:
PROC UNIVARIATE data =
table_name <options>;
VAR column-list;
RUN;</options>

96
Q

When using PROC UNIVARIATE what is the function of an ID statement?

A

*An ID statement can be used to identify the values of the extreme observations.

For example, with the demographic data, there may be a requirement to see the age of the people who have the highest and lowest salaries:

Proc univariate data=amds.demo;
var salary;
id age;
run;

97
Q

How can a programmer control SAS system?

A

SAS System Options
* SAS System Options are settings that influence the SAS session and determine the way in which different parts of the system operate.
* Unlike data set options, system options apply to the SAS session and remain in effect unless subsequently changed.

98
Q

Where can Formats and Informats be applied?

A

Formats and Informats can be applied in many parts of the SAS System. They are not restricted to SAS programs.
* They can be assigned in the DATA step, where they are held in the header part of the SAS table. Proc’s will automatically use the formats but In order to use the output table, the Format must always be available.

  • Formats may also be assigned in the PROC step, where they are used for that procedure only. The format does not need to be available in order to use the table but A FORMAT statement must be used in every PROC step.
98
Q

Give examples of option statements

A
  • To change a system option within a program, use the OPTIONS statement. The syntax of an options statement is:

OPTIONS option_name-1<=option_value1> <option_name-n<=option_value2»

  • The OBS Option
    The OBS system option sets the observation number that will be the last SAS will process

OPTIONS obs=10
/proc code/
OPTIONS obs=max

  • The MISSING Option
    This option is just for display purposes and does not re-code the data in any way.

OPTIONS missing=0
/proc code/
OPTIONS missing=.;

  • Titles and Footnotes
    Headers and Footers are added to a report using TITLE and FOOTNOTE global statements respectively:
99
Q

How can you control output destinations?

A

Tools -> options->preferences, and select the Results General section:

100
Q

How can you remove the observation column?

A

To remove the Obs column specify the NOOBS option on the Proc PRINT statement;

proc print data=amds.demo NOOBS;
var gender grade salary;
run;

101
Q

What type of procedure is PROC TABULATE?

A
  • Proc TABULATE is a procedure for producing tables in the SAS System. The procedure can be used to generate detailed cross tabulations of data, with a range of statistics to choose from.
  • The procedure uses a minimum of four statements;
    1. PROC TABULATE data = SAS dataset - Defines the SAS data file name.
    2. CLASS variables; Must be numeric, classify the data into rows and columns in the table
    VAR variables; *Num/Char, those which some sort of analysis is performed, and the values will form the contents of the cells in the table:
    3. TABLE all variables listed on the CLASS and VAR statements; Write a TABLE statement to create a tabulation report
    4. Run;
102
Q

Give the function of the operators below;
The SPACE operator
The COMMA operator
The ASTERISK operator

A

The SPACE operator
* This operator gives a CONCATENATION between the columns, making one table from two

The COMMA operator
*The comma operator between
column names gives a split. The left variable provides the rows and right variable provides the columns.

The ASTERISK operator
* This operator results in a nesting of columns

104
Q

Where can Formats and Informats be applied?

A

Formats and Informats can be applied in many parts of the SAS System. They are not restricted to SAS programs.
* They can be assigned in the DATA step, where they are held in the header part of the SAS table. Proc’s will automatically use the formats but In order to use the output table, the Format must always be available.

  • Formats may also be assigned in the PROC step, where they are used for that procedure only. The format does not need to be available in order to use the table but A FORMAT statement must be used in every PROC step.
109
Q

Where can Formats and Informats be applied?

A

Formats and Informats can be applied in many parts of the SAS System. They are not restricted to SAS programs.
* They can be assigned in the DATA step, where they are held in the header part of the SAS table. Proc’s will automatically use the formats but In order to use the output table, the Format must always be available.

  • Formats may also be assigned in the PROC step, where they are used for that procedure only. The format does not need to be available in order to use the table but A FORMAT statement must be used in every PROC step.
110
Q

Identify the statements that are True:
a) The DATA step contains three phases: compilation, execution and completion.
b) The DATA step processes one observation at a time.
c) An observation is written out to the output data set when the DATA statement is reached.
d) All variables available in the LPDV are written to the output data set, including the N and ERROR.
e) At the beginning of each DATA step iteration, SAS will initialise to missing variables that are created in that DATA step.

For the ones that are False, what would be the correct statement?

A

The following statements are True:
b. The DATA step processes one observation at a time
e. At the beginning of each DATA step iteration, SAS will initialise to missing variables that are created in that DATA step. – N.B.But NOT variables created with a RETAIN or SUM statement.

Below are the corrected statements:
a. The DATA step contains two phases: compilation and execution.
c. An observation is written out to the output data set when a RUN or OUTPUT statement is reached.
d. The N and ERROR automatic variables do not get written to the data set

111
Q

Select the statements that are True:
a. Formats change the underlying data values in a SAS data set.
b. Informats are used to read data values into a SAS variable using a template.
c. Formats and Labels applied inside a Procedure are temporary and are only active for that procedure.
d. Only one format can be created inside a Proc FORMAT step.
e. Character formats do not need to include a dot as they do not have a decimal part.

A

The following are True statements:
b. Informats are used to read data values into a SAS variable using a template.
c. Formats and Labels applied inside a Procedure are temporary and are only active for that procedure