SAS: Base Programming Flashcards

1
Q

Lesson 1: Getting Started with SAS

SAS Modes

A

Interactive

Batch

Noninteractive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Lesson 1: Getting Started with SAS

Interactive Mode

A

Code runs immediately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Lesson 1: Getting Started with SAS

Batch Mode

A

SAS runs the program in the background by submitting it to the operating environment for batch execution.

Program runs when the computer has resources available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Lesson 1: Getting Started with SAS

Noninteractive Mode

A

Store store SAS programming statements in a file.

When you issue a SAS command referencing the file, SAS opens, execute the code immediately, and stores any log messages and output in files.

Then SAS closes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Lesson 1: Getting Started with SAS

Types of steps in a SAS program?

A

Data or Proc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Lesson 1: Getting Started with SAS

Data Step

A

A DATA step typically takes data, processes it, and creates a SAS data set, which is data in a form that SAS understands.

A DATA step can create new variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Lesson 1: Getting Started with SAS

Proc Step

A

A PROC step typically processes a SAS data set.

PROC steps can generate reports and graphs, manage data, and sort data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Lesson 1: Getting Started with SAS

Data read by SAS

A

Oracle files

SAS data sets

Raw data files

Excel files

Other types of files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Lesson 1: Getting Started with SAS

Tasks useful in interactive mode

A

Debugging programs

Editing programs

Testing programs

Writing programs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Lesson 2: Navigating and Using the SAS Interface

Three main programming windows

A

Editor window

Log window

Output window

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Lesson 2: Navigating and Using the SAS Interface

True statements about SAS output

A

Can create both listing and HTML output

Manage all SAS output in results window

Listing and HTML output are seperate in the results window

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Lesson 2: Navigating and Using the SAS Interface

Editor Window

A

Also known as Enhanced Editor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Lesson 2: Navigating and Using the SAS Interface

Program Editor Window

A

Default code editor in operating environments other than Windows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Lesson 2: Navigating and Using the SAS Interface

Log Window

A

Provides messages about your program

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Lesson 2: Navigating and Using the SAS Interface

Output window

A

SAS continually adds output to this window

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Lesson 2: Navigating and Using the SAS Interface

Results vs Output Windows

A

Results window displays a tree view of output that is created during a SAS session.

Output window is available in all SAS environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Lesson 2: Navigating and Using the SAS Interface

Are session result preferences stored in SAS program.

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Lesson 2: Navigating and Using the SAS Interface

What can be done in Explorer window?

A

Move, copy and delete SAS files

View contents of SAS libraries

Open and create new SAS files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Lesson 3: Working with SAS Code

Diagnosing and correcting unbalanced quotation marks

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Lesson 3: Working with SAS Code

Diagnosing and correcting syntax errors

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Lesson 3: Working with SAS Code

Syntax Error

A

When programming statements don’t conform to the rules of the SAS language at compile time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Lesson 3: Working with SAS Code

Semantic Error

A

When the language element is correct, but the element might not be valid for a particular usage at compile time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Lesson 3: Working with SAS Code

Execution Time Error

A

When SAS attempts to execute a program and execution fails at execution time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Lesson 3: Working with SAS Code

Data Error

A

When data values are invalid at execution time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
**Lesson 3: Working with SAS Code** Macrorelated Error
When you use the macro facility incorrectly
26
**Lesson 3: Working with SAS Code** How SAS compiles and executes
SAS compiles and executes each DATA or PROC step independently based on step boundaries, such as a RUN statement, a QUIT statement, or the beginning of a new DATA or PROC step.
27
**Lesson 3: Working with SAS Code** Block Comment
Can appear anywhere a single blank can appear Cancontain semicolons or unmatched quotation marks Denoted with /\* COMMENT \*/
28
**Lesson 3: Working with SAS Code** Step Boundaries
Run Quit Data Statement Proc Statement
29
**Lesson 3: Working with SAS Code** Long running message at top of window indicating that a step is running
Missing run statement or unbalanced quotation marks
30
**Lesson 3: Working with SAS Code** A log message indicating that an option seems to be invalid
Missing Semicolon
31
**Lesson 3: Working with SAS Code** Log Message indicating that a word wasn't recognized
Spelling Error
32
**Lesson 4: Working with SAS Libraries and SAS Data Sets** SAS Library
Collection of one or more SAS files that are recognized by SAS and that are referenced and stored as a unit
33
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Libref
Logical name to reference a SAS library
34
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Libraries created at SAS session start
Work: Temporary library Sasuser: Permanent library
35
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Naming a SAS data set
libref. data-set-name work. data library\_name.data-set-name
36
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Libname
37
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Assigning Libref
In an interactive SAS session, a libref that you assign with a LIBNAME statement remains assigned until you cancel or change the libref or until you end your SAS session. After you end your SAS session, the contents of a permanent library still exist in their physical location. However, each time you start a new SAS session, you must resubmit a LIBNAME statement to reassign the libref to that permanent library.
38
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Proc Contents
View the contents of a SAS library **PROC CONTENTS DATA=libref.\_ALL\_ NODS;** **RUN; **
39
**Lesson 4: Working with SAS Libraries and SAS Data Sets** \_ALL\_ in the PROC CONTENTS
Displays a list of all the SAS files that are in the specified SAS library.
40
**Lesson 4: Working with SAS Libraries and SAS Data Sets** NODS in PROC CONTENTS
Default PROC CONTENTS report includes the descriptor portion of each data set in the SAS library. NODS option to suppress the descriptor portions in the report.
41
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Variable
Column in SAS data set
42
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Observation
Row in SAS data set
43
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Value vs Descriptor
The data portion of a SAS data set contains the data values. The descriptor portion of a SAS data set contains information about the attributes of the data set and information about each variable. The information in the descriptor portion is called metadata.
44
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Viewing the Descriptor Portion of SAS Data Sets
You can use a PROC CONTENTS step to view the descriptor portion of a data set. When you specify a single data set in the PROC CONTENTS step and do not include the NODS option, the step creates a report of the descriptor portion for the specified data set. **PROC CONTENTS DATA=SAS-data-set;** **RUN;**
45
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Character Variable
Default Lenth of 1 to 32,767 bytes Missing values are represented by blanks
46
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Numeric Variable
Default length of 8 bytes Missing values represented by '.'
47
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Var Statement
Control which variables SAS includes in the Proc Print report The VAR statement also controls the order in which SAS displays the variables in the report. PROC PRINT DATA=SAS-data-set; VAR variable(s); RUN;
48
**Lesson 4: Working with SAS Libraries and SAS Data Sets** NOOBS
BY default, SAS includes the observation numbers in the report. Suppress the Obs column by using the NOOBS option in the PROC PRINT statement. **PROC PRINT DATA=SAS-data-set NOOBS;** **VAR variable(s);** **RUN;**
49
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Viewing the Data Portion of SAS Data Sets
You can use the PRINT procedure to display the data portion of a SAS data set. Missing values are represented by blanks for character variables and by periods for numeric variables. **PROC PRINT DATA=SAS-data-set;** **RUN;**
50
**Lesson 4: Working with SAS Libraries and SAS Data Sets** PROC SORT
Sorting Observations in SAS Data Sets By default, replaces the original data set with the sorted data set. OUT= option to create a new output data set instead of replacing the input data set By default sorts observations in ascending order of the values for the variable or variables listed in the BY statement. You can use the DESCENDING option to sort on the variable or variables in descending order instead. SAS treats missing values as the smallest possible values. PROC SORT DATA=input-SAS-data-set; BY BY-variable(s); RUN;
51
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Defining SAS Libraries for Relational Databases
After a database is associated with a libref, you can use a SAS two-level name to specify any table in the database and then work with that table as you would work with a SAS data set. You use the CLEAR option in the LIBNAME statement to disassociate a libref that was previously assigned. Disassociating a libref disconnects the database engine from the database and closes any resources that are associated with that libref’s connection. **LIBNAME libref engine-name ;** **LIBNAME libref CLEAR;**
52
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Sample Code Assigning a Libref and Viewing the Contents of the Library
libname orion 'my-file-path'; proc contents data=orion.\_all\_nods; run;
53
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Sample Code Creating Work.NewSalesemps
data work.newsalesemps; length First\_Name $ 12 Last\_Name $ 18 Job\_Title $ 25; infile 'my-file-path\newemployees.csv' dlm=','; input First\_Name $ Last\_Name $ Job\_Title $ Salary; run;
54
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Sample Code Displaying the Descriptor Portion of a Data Set
proc contents data=work.newsalesemps; run;
55
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Sample Code Displaying the Data Portion of a Data Set
proc print data=work.newsalesemps noobs; var Last\_Name First\_Name Salary; run;
56
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Sample Code Sorting a Data Set by Two Variables and Displaying the Output Data Set
proc sort data=work.newsalesemps out=work.newsalesemps3; by Job\_Title descending Salary; run; proc print data=work.newsalesemps3; run;
57
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Steps/statements that create printed output
Proc Print Proc Contents
58
**Lesson 4: Working with SAS Libraries and SAS Data Sets** Temporary Data Set
work.data or data when specify a one level data set name SAS assumes the library is work and the data set is temp
59
**Lesson 5: Creating SAS Data Sets** $w.
writes standard character data.
60
**Lesson 5: Creating SAS Data Sets** w.d
writes standard numeric data
61
**Lesson 5: Creating SAS Data Sets** COMMAw.d
writes numeric values with a comma that separates every three digits and a period that separates every decimal fraction. COMMA7.0 = 12,354
62
**Lesson 5: Creating SAS Data Sets** COMMAXw.d
writes numeric values with a period that separates every three digits and a comma that separates the decimal fraction. COMMAX7.0 = 12.354
63
**Lesson 5: Creating SAS Data Sets** DOLLARw.d
writes numeric values with a leading dollar sign, a comma that separates every three digits, and a period that separates the decimal fraction. DOLLAR7.0 = $12,345
64
**Lesson 5: Creating SAS Data Sets** EUROXw.d
writes numeric values with a leading euro symbol (€), a period that separates every three digits, and a comma that separates the decimal fraction.
65
**Lesson 5: Creating SAS Data Sets** MMDDYY6.
010160
66
**Lesson 5: Creating SAS Data Sets** MMDDYY8.
01/01/60
67
**Lesson 5: Creating SAS Data Sets** MMDDYY10.
01/01/1960
68
**Lesson 5: Creating SAS Data Sets** DDMMYY6.
311260
69
**Lesson 5: Creating SAS Data Sets** DDMMYY8.
31/12/60
70
**Lesson 5: Creating SAS Data Sets** DDMMYY10.
31/12/1960
71
**Lesson 5: Creating SAS Data Sets** DATE7.
31DEC59
72
**Lesson 5: Creating SAS Data Sets** DATE9.
31DEC1959
73
**Lesson 5: Creating SAS Data Sets** WORDDATE.
January 1, 1960
74
**Lesson 5: Creating SAS Data Sets** WEEKDATE.
Friday, January 1, 1960
75
**Lesson 5: Creating SAS Data Sets** MONYY7.
JAN1960
76
**Lesson 5: Creating SAS Data Sets** YEAR4.
1960
77
**Lesson 5: Creating SAS Data Sets** Creating A SAS Data Set from a SAS Data Set
A basic DATA step that contains only a SET statement creates an exact copy of the input SAS data set. A DATA step can reference temporary or permanent libraries. Remember that you use a LIBNAME statement to assign a libref to a SAS library. **DATA output-SAS-data-set-name; SET input-SAS-data-set-name; RUN;**
78
**Lesson 5: Creating SAS Data Sets** Phases of DATA Step
1. Compilation Phase: SAS scans the step for syntax errors. Then, SAS creates the program data vector (PDV) and the descriptor portion of the output data set. The PDV contains two temporary variables: \_N\_, which tracks the iterations of the DATA step, and \_ERROR\_, which signals the occurrence of an error that is caused by the data during execution. 2. Execution Phase: SAS reads and processes the input data set, and creates the data portion of the output data set.
79
**Lesson 5: Creating SAS Data Sets** Subsetting Observations in the DATA Step Where Statement
Ccontrols which observations SAS writes to the output data set **WHERE where-expression;**
80
**Lesson 5: Creating SAS Data Sets** Subsetting Variables in the DATA Step Drop + Keep Statements
DROP: Names the variables to exclude from the output data set KEEP: Names the variables to include in the output data set **DROP variable-list;** **KEEP variable-list;**
81
**Lesson 5: Creating SAS Data Sets** Assigning Permanent Labels and Formats
You can assign permanent labels and format to the variables in your output data set to control the way that variables names and values appear in reports. Labels and formats don’t affect the stored names or values; they are stored in the descriptor portion of the data set. You use a LABEL statement to assign a label to a variable. To specify that SAS should display variable labels rather than names in a PROC PRINT report, you use the LABEL option in the PROC PRINT statement. You use a FORMAT statement to assign a format to a variable.
82
**Lesson 5: Creating SAS Data Sets** Sample Code Where Variable
data work.subset1; set orion.sales; where Country='AU' and Job\_Title contains 'Rep'; run;
83
**Lesson 5: Creating SAS Data Sets** Sample Code Keep Statement
data work.subset1; set orion.sales; where Country='AU' and Job\_Title contains 'Rep'; keep First\_Name Last\_Name Salary Job\_Title Hire\_Date; run; proc print data=work.subset1 label; run;
84
**Lesson 5: Creating SAS Data Sets** Sample Code Label Statement
data work.subset1; set orion.sales; where Country='AU' and Job\_Title contains 'Rep'; keep First\_Name Last\_Name Salary Job\_Title Hire\_Date; label Job\_Title='Sales Title' Hire\_Date='Date Hired'; run;
85
**Lesson 5: Creating SAS Data Sets** Sample Code Format Statement
data work.subset1; set orion.sales; where Country='AU' and Job\_Title contains 'Rep'; keep First\_Name Last\_Name Salary Job\_Title Hire\_Date; label Job\_Title='Sales Title' Hire\_Date='Date Hired'; format Salary comma8. Hire\_Date ddmmyy10.; run;
86
**Lesson 5: Creating SAS Data Sets** Sample Code Label in Proc Print
data work.subset1; set orion.sales; where Country='AU' and Job\_Title contains 'Rep'; keep First\_Name Last\_Name Salary Job\_Title Hire\_Date; label Job\_Title='Sales Title' Hire\_Date='Date Hired'; format Salary comma8. Hire\_Date ddmmyy10.; run; proc print data=work.subset1 label; run;
87
**Lesson 5: Creating SAS Data Sets** When does Data Step stop executing?
No more observations to read in the input data set.
88
**Lesson 5: Creating SAS Data Sets** SAS Base Date
January 1, 1960
89
**Lesson 5: Creating SAS Data Sets** PDV Variables
\_N\_, which tracks the iterations of the DATA step \_ERROR\_, which signals the occurrence of an error that is caused by the data during execution If there is an Error = 1
90
**Lesson 5: Creating SAS Data Sets** Operand
varianble name or constant numeric or character
91
**Lesson 5: Creating SAS Data Sets** operator
arithmetic, logical or comparison
92
**Lesson 5: Creating SAS Data Sets** Created in Compilation Phase
descriptor portion of output data set Program Data Vector: (PDV) The \_N\_ and \_ERROR\_ automatic variables
93
**Lesson 6: Creating SAS Data Sets from Microsoft Excel Worksheets** Accessing and Viewing Excel Data in SAS
You can use the SAS/ACCESS LIBNAME statement to assign a libref to a Microsoft Excel workbook. Then, SAS treats each worksheet in the workbook as though it is a SAS data set. You can use the Explorer window or the CONTENTS procedure to view the worksheets, or you can reference a worksheet directly in a DATA or PROC step. In SAS, Excel worksheet names contain a dollar sign. To reference an Excel worksheet directly in a DATA or PROC step, you use a SAS name literal because a valid SAS name cannot contain a dollar sign. **LIBNAME libref 'physical-file-name';**
94
**Lesson 6: Creating SAS Data Sets from Microsoft Excel Worksheets** Using an Excel Worksheet as Input in the DATA Step
You can use an Excel worksheet as input data in a DATA step to create a new SAS data set. You use a SET statement with a SAS name literal to read from an Excel worksheet. You can also use other DATA step statements such as WHERE, KEEP, DROP, LABEL, and FORMAT statements with input from an Excel worksheet.
95
**Lesson 6: Creating SAS Data Sets from Microsoft Excel Worksheets** Importing an Excel Worksheet
You can use a PROC IMPORT step to read an Excel worksheet and create a SAS data set from it. The Import Wizard is a point-and-click interface that generates PROC IMPORT code for you. **PROC IMPORT OUT= output-data-set** ** DATAFILE='input=excel-workbook'** ** DBMS=EXCEL REPLACE;** ** RANGE='range-name';** **RUN;**
96
**Lesson 6: Creating SAS Data Sets from Microsoft Excel Worksheets** Creating an Excel Workbook in SAS with Data Step
You can use the DATA step to create an Excel worksheet from a SAS data set if you use the SAS/ACCESS LIBNAME statement to assign a libref to the Excel workbook that contains the worksheet. You use one DATA step for each worksheet that you want to create. **LIBNAME libref 'physical-file-name';** **DATA output-excel-worksheet;** ** SET input-data-set;** **RUN;**
97
**Lesson 6: Creating SAS Data Sets from Microsoft Excel Worksheets** Creating an Excel Workbook in SAS PROC COPY
You can use the SAS/ACCESS LIBNAME statement along with the COPY procedure to create an Excel workbook that contains one or more worksheets from one or more SAS data sets. The COPY procedure creates a worksheet for each SAS data set that is listed in the SELECT statement. **LIBNAME output-libref 'physical-file-name';** **PROC COPY IN=input-libref OUT=output-libref;** ** SELECT input-data-set1 input-data-set2;** **RUN;**
98
**Lesson 6: Creating SAS Data Sets from Microsoft Excel Worksheets** Creating an Excel Workbook in SAS PROC EXPORT
The Export Wizard is a point-and-click interface that generates PROC EXPORT code for you. **PROC EXPORT DATA= input-data-set** ** OUTFILE='output-excel-workbook'** ** DBMS=EXCEL REPLACE;** **RUN;**
99
**Lesson 6: Creating SAS Data Sets from Microsoft Excel Worksheets** Sample Code Using PROC IMPORT to Create a SAS Data Set from an Excel Worksheet
proc import out=work.subset2a; datafile='my-file-path\sales.xls'; range="Australia$"n; getnames=yes; mixed=no; scantext=yes; usedate=yes; scantime=yes; run;
100
**Lesson 6: Creating SAS Data Sets from Microsoft Excel Worksheets** Sample Code Using the DATA Step to Create an Excel Workbook from SAS Data Sets
libname orionxls 'my-file-path\qtr20007a.xls'; data orionxls.qtr1\_2007; set orion.qtr2\_2007; run; data orionxls.qwtr2\_2007; set orion.qtr2\_2007; run;
101
**Lesson 6: Creating SAS Data Sets from Microsoft Excel Worksheets** Sample Code Using PROC COPY to Create an Excel Workbook from SAS Data Sets
libname orionxls 'my-file-path\qtr20007b.xls'; proc copy in=orion out=orionxls; select qtr1\_2007 qtr2\_2007; run;
102
**Lesson 6: Creating SAS Data Sets from Microsoft Excel Worksheets** By default, how does the Data Step store an Excel value that is formatted as a date?
numeric SAS date value with the DATE9. format
103
**Lesson 6: Creating SAS Data Sets from Microsoft Excel Worksheets** What do headings with spaces in Excel worksheets become
labels
104
**Lesson 7: Creating SAS Data Sets from Delimited Raw Data Files** Data Step Processing Diagram
105
**Lesson 7: Creating SAS Data Sets from Delimited Raw Data Files** Raw Data File
A raw data file is an external file whose records contain data values that are organized in fields. Raw data files are not software-specific.
106
**Lesson 7: Creating SAS Data Sets from Delimited Raw Data Files** Delimited Raw Data File
File in which the data values are separated by spaces or other special characters.
107
**Lesson 7: Creating SAS Data Sets from Delimited Raw Data Files** Creating a SAS Data Set from a Raw Data File
You can use a DATA step to read input from a raw data file and create a SAS data set from it. Instead of a SET statement, you use an INFILE statement and an INPUT statement. The INFILE statement specifies the name and location of the raw data file, and the INPUT statement provides specifications for SAS to read the data values as variables. You use the DLM= option in the INFILE statement to specify a delimiter if the raw data file uses a character other than a blank space to delimit data values. Delimited data is also known as list input. Other types of raw data files might contain column input or formatted input. Raw data files can contain either standard or nonstandard data values. A raw data file does not contain names for the data fields. You specify variable names for the values in the INPUT statement, and the names that you use must follow the rules for SAS names. **DATA output-SAS-data-set-name; INFILE 'raw-data-file-name' ; INPUT specifications; RUN;**
108
**Lesson 7: Creating SAS Data Sets from Delimited Raw Data Files** Specifying Lengths of Variables from Raw Data Files
Default each variable with a length of 8 bytes (SAS truncates character values that are longer than 8 characters) You can specify multiple variables in one LENGTH statement. You can either specify different lengths for each variable that you list in the LENGTH statement or you can specify one length for all of the variables listed. The LENGTH statement must precede the INPUT statement in a DATA step, because SAS determines variable attributes such as length the first time it encounters a variable. **LENGTH variable(s) $ length;**
109
**Lesson 7: Creating SAS Data Sets from Delimited Raw Data Files** Working with Nonstandard Data Delimited Raw Data Files
Nonstandard data is data that SAS cannot read without extra instructions. To read nonstandard data, you use the colon format modifier and an informat in the INPUTstatement. Informats are similar to formats. An informat is the instruction that SAS uses to read data values into a variable. An informat also provides a length for a character variable. **INPUT variable \<$\> variable \<:informat\>;**
110
**Lesson 7: Creating SAS Data Sets from Delimited Raw Data Files** Further DATA Step Processing Delimited Raw Data Files
Can use a DROP statement or a KEEP statement Can use FORMAT and LABEL statements to add permanent attributes to your output data CANNOT use a WHERE statement
111
**Lesson 7: Creating SAS Data Sets from Delimited Raw Data Files** Further DATA Step Processing Reading Instream Data
Instream data is data that you specify within your program You use a DATALINES statement to read instream data You use a NULL statement to indicate the end of the input data You use the INPUT statement for list input to specify that the instream data is delimited with blanks. You could also use a LENGTH statement before a DATALINES statement to specify lengths for variables within the instream data that are shorter or longer than 8 bytes. DATA output-SAS-data-set; INPUT specifications; DATALINES; instream data ; RUN;
112
**Lesson 7: Creating SAS Data Sets from Delimited Raw Data Files** Sample Code Creating a SAS Data Set from a Raw Data File
data work.subset3; infile 'my-file-path\sales.csv' dlm=','; length First\_Name $ 12 Last\_Name $ 18 Gender $ 1 Job\_Title $ 25 Country $ 2; input Employee\_ID First\_Name $ Last\_Name $ Gender $ Salary Job\_Title $ Country $ Birth\_Date :date9. Hire\_Date :mmddyy10.; keep First\_Name Last\_Name Salary Job\_Title Hire\_Date; label Job\_Title='Sales Title' Hire\_Date='Date Hired'; format Salary dollar12. Hire\_Date monyy7.; run;
113
**Lesson 7: Creating SAS Data Sets from Delimited Raw Data Files** Sample Code Reading Instream Data
data work.subset4; input Employee\_ID First\_Name $ Last\_Name $; datalines; 120102 Tom Zhou 120103 Wilson Dawes 120121 Irenie Elvish ; run;
114
**Lesson 8: Validating and Cleaning Data** Validating Data
Process of identifying invalid values in your data. For example, you might have character values stored in a numeric variable. Or you might have invalid date values such as 99NOV1978. Invalid data values cause data errors when the program runs.
115
**Lesson 8: Validating and Cleaning Data** SAS procedures for detecting invalid data
PROC PRINT PROC FREQ PROC MEANS PROC UNIVARIATE
116
**Lesson 8: Validating and Cleaning Data** Examining Data Errors When Reading Raw Data Files PROC PRINT
You can use a PROC PRINT step with a VAR statement and a WHERE statement to create a report of observations with invalid values. For example, you can create a report that includes observations that have missing values for a particular variable. You use a SAS date constant to convert a calendar date to a SAS date value. **PROC PRINT DATA=SAS-data-set;** ** VAR variable(s);** ** WHERE where-expression;** **RUN;**
117
**Lesson 8: Validating and Cleaning Data** Examining Data Errors When Reading Raw Data Files PROC FREQ
You cannot use PROC PRINT to detect values that are not unique. Instead, you can use PROC FREQ to view a frequency table of the unique values for a variable. You can use the TABLES statement in a PROC FREQ step to specify which frequency tables to produce. You can use the NLEVELS option to display a table that provides the number of distinct values for each variable named in the TABLES statement. You can use the NOPRINT option in the TABLES statement to suppress the individual frequency tables. **PROC FREQ DATA=SAS-data-set ;** ** TABLES variable(s) ;** **RUN;**
118
**Lesson 8: Validating and Cleaning Data** Validating Data PROC MEANS
You can use PROC MEANS to create summary reports of descriptive statistics such as SUM, MEAN, and RANGE Default statistics for all numeric variables: N, MEAN, STD, MIN, and MAX PROC MEAN step, such as NMISS, which is the number of observations with missing values
119
**Lesson 8: Validating and Cleaning Data** Validating Data PROC UNIVARIATE
PROC UNIVARIATE produces summary reports of descriptive statistics. In addition, PROC UNIVARIATE produces tables of extreme observations and missing values. The NEXTROBS= option in the PROC UNIVARIATE statement specifies the number of extreme observations that PROC UNIVARIATE lists.
120
**Lesson 8: Validating and Cleaning Data** Cleaning Data Programmatically Using the DATA Step Assignment Statement
You can use an assignment statement to convert values of a variable. The assignment statement is one of the few SAS statements that don’t begin with a keyword. An assignment statement evaluates an expression and assigns the resulting value to a new or existing variable. **variable=expression**
121
**Lesson 8: Validating and Cleaning Data** Cleaning Data Programmatically Using the DATA Step Upcase
Converts all letters in an argument to uppercase **UPCASE(argument)**
122
**Lesson 8: Validating and Cleaning Data** Cleaning Data Programmatically Using the DATA Step IF-THEN
By default, an assignment statement in a DATA step applies to every observation in the data set. To apply different assignment statements to different observations, you can use conditional statements. The IF-THEN statement executes a SAS statement for observations that meet specific conditions. **IF expression THEN statement;**
123
**Lesson 8: Validating and Cleaning Data** Cleaning Data Programmatically Using PROC SORT NODUPRECS
You can use PROC SORT to remove duplicate observations from your data by specifying the NODUPRECS option in the PROC SORT statement. NODUPRECS checks only consecutive observations, so it’s a good idea to sort by all variables when you use NODUPRECS. You can use the NODUPKEY option in the PROC SORT statement to remove observations with duplicate BY values. You can use the EQUALS option along with the NODUPRECS and NODUPKEY options in the PROC SORT statement to maintain the relative order of the observations within the input data set and the output data set.
124
**Lesson 8: Validating and Cleaning Data** Sample Code Using PROC PRINT to Validate Data
proc print data=orion.nonsales; var Employee\_ID Gender Salary Job\_Title Country Birth\_Date Hire\_Date; where Employee\_ID = . or Gender not in ('F','M') or Salary not between 24000 and 500000 or Job\_Title = ' ' or Country not in ('AU','US') or Birth\_Date \> Hire\_Date or Hire\_Date \< '01JAN1974'd; run;
125
**Lesson 8: Validating and Cleaning Data** Sample Code Using PROC FREQ to Validate Data
proc freq data=orion.nonsales nlevels; table \_all\_ / noprint; run;
126
**Lesson 8: Validating and Cleaning Data** Sample Code Using PROC MEANS to Validate Data
proc means data=orion.nonsales n nmiss min max; var Salary; run;
127
**Lesson 8: Validating and Cleaning Data** Sample Code Using PROC UNIVARIATE to Validate Data
proc univariate data=orion.nonsales nextrobs=8; var Salary; run;
128
**Lesson 8: Validating and Cleaning Data** Sample Code Cleaning Data Programmatically Using Assignment Statements
data work.clean; set orion.nonsales; Country=upcase(Country); run;
129
**Lesson 8: Validating and Cleaning Data** Sample Code Cleaning Data Programmatically Using IF-THEN/ELSE Statements
data work.clean; set orion.nonsales; Country=upcase(Country); if Employee\_ID=120106 then Salary=26960; else if Employee\_ID=120115 then Salary=26500; else if Employee\_ID=120191 then Salary=24015; else if Employee\_ID=120107 then Hire\_Date='21JAN1995'd; else if Employee\_ID=12011 then Hire\_Date='01NOV1978'd; else if Employee\_ID=121011 then Hire\_Date='01JAN1998'd; run;
130
**Lesson 8: Validating and Cleaning Data** Sample Code Removing Duplicates Using PROC SORT
proc sort data=orion.nonsalesdupes out=sorted nodupkey equals; by Employee\_ID; run;
131
**Lesson 8: Validating and Cleaning Data** Integrity Contraints
Integrity constraints are data validation rules that you can specify to restrict the data values that can be stored for a variable in a SAS data set. To eliminate the possibility of invalid data in your data set, you can place integrity constraints on the data set. Integrity constraints help you preserve the validity and consistency of your data. When the values associated with an integrity constraint variable are added, updated, or deleted, SAS enforces the integrity constraints.
132
**Lesson 8: Validating and Cleaning Data** General Integrity Constraints
General integrity constraints restrict the values of variables within a single file
133
**Lesson 8: Validating and Cleaning Data** General Integrity Constraints Check
Limits the data values of variables to a specific set, range, or list ofvalues. Check constraints can also be used to ensure that the data values in one variable within an observation are contingent on the data values of another variable in the same observation.
134
**Lesson 8: Validating and Cleaning Data** General Integrity Constraints Not Null
Requires that a variable contain a data value. Null (missing) values are not allowed.
135
**Lesson 8: Validating and Cleaning Data** General Integrity Constraints Unique
Requires that the specified variable or variables contain unique data values. A null data value is allowed but is limited to a single instance.
136
**Lesson 8: Validating and Cleaning Data** General Integrity Constraints Primary Key
Requires that the specified variable or variables contain unique data values and that null data values are not allowed. Only one primary key can exist in a data file.
137
**Lesson 8: Validating and Cleaning Data** Referential Integrity Constraints
A referential integrity constraint is created when a primary key integrity constraint in one data file is referenced by a foreign key integrity constraint in another data file. The foreign key constraint links the data values of one or more variables in the foreign key data file, to corresponding variables and values in the primary key data file. Data values in the foreign key data file must have a matching value in the primary key data file, or they must be null. When data is updated or deleted in the primary key data file, the modifications are controlled by a referential action that is defined as part of the foreign key constraint. You can define separate referential actions for the update and delete operations.
138
**Lesson 8: Validating and Cleaning Data** Referential Integrity Constraints Restrict
Prevents the data values of the primary key variables from being updated or deleted if there is a matching value in one of the foreign key data file’s corresponding foreign key variables. The restrict type of action is the default action if one is not specified.
139
**Lesson 8: Validating and Cleaning Data** Referential Integrity Constraints Set Null
Enables the data values of the primary key variables to be updated or deleted, but matching data values in the foreign key data files are changed to null (missing) values.
140
**Lesson 8: Validating and Cleaning Data** Referential Integrity Constraints Cascade
Enables the data values in the primary key variables to be updated, and additionally updates matching data values in the foreign key data files to the same value. The cascade type of action is supported only for update operations.
141
**Lesson 9: Manipulating Data** OUTPUT Statement
Use in a conditional statement to write the current observation to a specific data set that is listed in the DATA statement. Tells SAS to write the current observation to a SAS data set immediately, not at the end of the DATA step. OUTPUT ;
142
**Lesson 9: Manipulating Data** Creating Multiple Data Sets OUTPUT Statement
The SAS data set Orion.Employee\_Addresses contains addresses for all Orion Star employees. Suppose you want to create three separate data sets based on the value of the variable Country. If the value of Country is US, write the observation to the USA data set. If the value of Country is AU, write the observation to the Australia data set. If the value of Country is anything else, write the observation to the Other data set. data usa australia other; set orion.employee\_addresses; if Country = 'AU' then output australia; else if Country = 'US' then output usa; else output other; run;
143
**Lesson 9: Manipulating Data** Writing Errors to a Separate Data Set OUTPUT Statement
Suppose you're reading a raw data file and want to create two data sets based on the value of the automatic temporary variable \_ERROR\_. If the observation contains errors, write the observation to the data set Errors. Otherwise, write the observation to the data set Correct. data errors correct; infile 'sales.csv' dlm=','; input Employee\_ID First $ Last $ Gender $ Salary Job\_Title $ Country $ Birth\_Date :date9. Hire\_Date :mmddyy10.; if \_ERROR\_ = 1 then output errors; else output correct; run;
144
**Lesson 9: Manipulating Data** What does SAS do when encountering a data error in a raw data record
Prints a ruler and the raw data record in the log Assigns a missing value to the variable that the invalid data affects Prints a note about the error in the log Prints the variable values in the corresponding SAS observation in the log
145
**Lesson 9: Manipulating Data** Using PROC Print to Validate Data
146
**Lesson 9: Manipulating Data** Uses of SAS Functions
calculate sample statistics create SAS date values convert U.S. ZIP codes to state postal codes round values generate random numbers extract a portion of a character value convert data from one data type to another
147
**Lesson 9: Manipulating Data** SAS Functions That Compute Sample Statistics
148
**Lesson 9: Manipulating Data** SAS Date Functions
149
**Lesson 9: Manipulating Data** IF/THEN Statement
Use to perform an action conditionally The IF/THEN statement executes a SAS statement when the condition in the IF clause is true. **IF expression THEN statement;** To assign the value 26960 to the variable Salary when the value of Employee\_ID is 120106, add the following IFTHEN statement to your DATA step. data work.clean; set orion.nonsales; if Employee\_ID=120106 then Salary=26960; run;
150
**Lesson 9: Manipulating Data** ELSE Statement Providing an Alternative Action
Suppose you want to assign a value to Salary based on the other possible values of Employee\_ID. One way to do this is to add IF/THEN statements for the other two conditions, as shown below: if Employee\_ID=120106 then Salary=26960; if Employee\_ID=120115 then Salary=26500; if Employee\_ID=120191 then Salary=24015; However, when the DATA step executes, SAS evaluates each IF statement in order, even if the first condition is true. This wastes system resources and slows the processing of your program. The ELSE statement must immediately follow the IFTHEN statement in your program. An ELSE statement executes only if the previous IF/THEN/ELSE statement is false. data work.clean; set orion.nonsales; if Employee\_ID=120106 then Salary=26960; else if Employee\_ID=120115 then Salary=26500; else if Employee\_ID=120191 then Salary=24015; run;
151
**Lesson 9: Manipulating Data** Length Statement
**LENGTH variable(s) $ length;** Variable name, a dollar sign if it is a character variable, and then the length. You can specify multiple variables in one LENGTH statement. Remember that SAS determines variable attributes such as length and type the first time it encounters a variable. So, for the LENGTH statement to define the length for the variables in the output data set, it needs to precede any other references to the variable. **data work.subset3; infile 'c:\oriondata\sales.csv'; length First\_Name $ 12 Last\_Name $ 18 Gender $ 1; input Employee\_ID First\_Name $ Last\_Name $ Gender $ Salary Job\_Title $ Country $; run;**
152
**Lesson 9: Manipulating Data** WHERE Statement
**WHERE whereexpression;** Specifies a condition for selecting observations and can be any valid SAS expression. An expression is a sequence of operands and operators that form a set of instructions. Must follow several rules: Can include variables, constants, or both Variables must exist in the input data set Character constants must be enclosed in quotation marks Numeric constants cannot be enclosed in quotation marks **data work.subset1; set orion.sales; where Country='AU' and Job\_Title contains 'Rep'; run;**
153
**Lesson 9: Manipulating Data** Creating Variables Assignment Statement
variable=expression; If the expression in an assignment statement contains a missing value, the result of the expression is also a missing value.
154
**Lesson 9: Manipulating Data** Creating Variables Sum Statement
**SUM(argument1,argument2,…)** Returns the sum of the specified arguments. If an argument contains a missing value, it ignores the missing value and returns the sum of the non-missing values.
155
**Lesson 9: Manipulating Data** Creating Variables Date Functions
**MONTH(SAS-date)** You can use SAS date functions in assignment statements to specify SAS date values as part of your expressions or to create SAS date values. The MONTH function extracts a month from a SAS date value.
156
**Lesson 9: Manipulating Data** Subsetting Variables
Remember that you can use a DROP statement or KEEP statement in a DATA step to subset variables in the output data set. When SAS processes a DROP statement, the variables listed in the DROP statement are included in the PDV. SAS reads values for those variables into the PDV for each observation, but does not write those variables to the output data set.
157
**Lesson 9: Manipulating Data** Creating Variables Conditionally DO statement within an IF-THEN/ELSE statement
The DO statement, together with the other SAS statements it contains, is known as a DO group. Each DO group must end with an END statement. SAS executes the statements withing the first DO group only if the expression in the IF-THEN statement is true. SAS executes the statements within the second DO group only if the expression in the first IF expression is false and the expression in the ELSE IF expression is true. Remember that SAS creates the attributes for a variable, including length, based on the first time it encounters that variable. You use a LENGTH statement to specify the length for a variable explicitly. I**F expression THEN** ** DO;** ** executable statements** ** END;** **ELSE IF expression THEN** ** DO;** ** executable statements** **END;**
158
**Lesson 9: Manipulating Data** Subsetting Observations Where vs If
The WHERE statement selects observations when they are read from the input data set, so you can’t use the WHERE statement to subset observations based on a variable that you create in the same DATA step. You can use the subsetting IF statement to subset observations based on the value of a variable you create. You can specify multiple expressions in a subsetting IF statement. You cannot use special WHERE operators in IF expressions. You must use the WHERE statement rather than a subsetting IF statement in a PROC step You can always use a subsetting IF statement in a DATA step. You can use a WHERE statement in a DATA step if the expression in the WHERE statement references only variables from the input data set, or from all input data sets if the step uses multiple input data sets. **IF expression;**
159
**Lesson 9: Manipulating Data** DELETE statement
You can use a DELETE statement within an IF-THEN statement as an alternative to the subsetting IF statement. The DELETE statement is especially useful if it is easier to specify a condition for excluding observations than for selecting observations. **IF expression THEN DELETE;**
160
**Lesson 9: Manipulating Data** Sample Code Creating Variables
data work.comp; set orion.sales; Bonus=500; Compensation=sum(Salary,Bonus); BonusMonth=month(Hire\_Date); run;
161
**Lesson 9: Manipulating Data** Sample Code Subsetting Variables
data work.comp; set orion.sales; drop Gender Salary Job\_Title Country Birth\_Date Hire\_Date; Bonus=500; Compensation=sum(Salary,Bonus); BonusMonth=month(Hire\_Date); run;
162
**Lesson 9: Manipulating Data** Sample Code Creating Variables Conditionally
data work.bonus; set orion.sales; length Freq $ 12; if Country='US' then do; Bonus=500; Freq='Once a Year'; end; else if Country='AU' then do; Bonus=500; Freq='Twice a Year'; end; run;
163
**Lesson 9: Manipulating Data** Sample Code Subsetting Observations
data work.december; set orion.sales; BonusMonth=month(Hire\_Date); if Country='AU' and BonusMonth=12; Bonus=500; Compensation=sum(Salary,Bonus); run;
164
**Lesson 9: Manipulating Data** YEAR(SASdate)
Value Extracted: the year Value Returned: a fourdigit year QTR(SASdate) the quarter a number from 1 to 4 MONTH(SASdate) the month a number from 1 to 12 DAY(SASdate) the day of the month a number from 1 to 31 WEEKDAY(SASdate) the day of the week a number from 1 to 7 (1=Sunday, 2=Monday, and so on)
165
**Lesson 9: Manipulating Data** QTR(SASdate)
Value Extracted: the quarter Value Returned: a number from 1 to 4
166
**Lesson 9: Manipulating Data** MONTH(SASdate)
Value Extracted: the month Value Returned: a number from 1 to 12
167
**Lesson 9: Manipulating Data** DAY(SASdate)
Value Extracted: the day of the month Value Returned: a number from 1 to 31
168
**Lesson 9: Manipulating Data** WEEKDAY(SASdate)
Value Extracted: the day of the week Value Returned: a number from 1 to 7 (1=Sunday, 2=Monday, and so on)
169
**Lesson 9: Manipulating Data** TODAY()
SAS Date Value Created: the current date
170
**Lesson 9: Manipulating Data** MDY(month,day,year)
SAS Date Value Created: a date with numeric month, day, and year
171
**Lesson 10: Combining SAS Data Sets Vertically** When to Use the FORCE Option in PROC APPEND
**DATA= data set variables are not in the BASE= data set:** Drops the variable not present in the BASE=data set. **DATA= data set variables have a different type than the variables in the BASE= data set:** Replaces all values for the variable in the DATA= data set with missing values + keeps the variable type of the variable specified in the BASE= data set. **DATA= data set variables are longer than the variables in the BASE= data set:** Truncates values from the DATA= data set to fit them into the length that is specified in the BASE= data set.
172
**Lesson 10: Combining SAS Data Sets Vertically** Comparing PROC APPEND and the SET Statement
173
**Lesson 10: Combining SAS Data Sets Vertically** Overview of Combining SAS Data Sets Vertically
Places the observations from one or more data sets below or above the observations in another data set. Three methods of combining data vertically: appending, concatenating, and interleaving. Appending and concatenating data sets both place observations from one data set after the observations from another data set in a single data set. In both methods, the observations in the combined data set are in the same order that they were in each original data set.
174
**Lesson 10: Combining SAS Data Sets Vertically** Appending Data Sets Definition
When you append data sets, you can only use two input data sets. Appending adds all of the observations from the second data set to the end of the first data set.
175
**Lesson 10: Combining SAS Data Sets Vertically** Concatenating Data Sets Definition
Concatenating copies all of the observations from the first data set and all of the observations from the second data set and writes them to a new data set.
176
**Lesson 10: Combining SAS Data Sets Vertically** Interleaving Data Sets Definition
Interleaving intersperses the observations from two or more input data sets based on the value of one or more common variables, in a new data set.
177
**Lesson 10: Combining SAS Data Sets Vertically** Appending SAS Data Sets
PROC APPEND adds the observations from one SAS data set to the end of another SAS data set. BASE= names the data set to which the observations are added, and DATA= names the data set containing observations that are added to the base data set. Can specify only two data sets. When SAS processes a PROC APPEND step, SAS does not read the observations in the base data set. SAS cannot change any variable information in the descriptor portion of the base data set. The base data set has the same number of variables before and after appending. When the DATA= data set contains variables that are not in the BASE= data set, you can use the FORCE option to force SAS to append the observations. The FORCE options causes SAS to drop the extra variables in the DATA= data set, and then to issue a warning message to the log. **PROC APPEND BASE=SAS-data-set DATA=SAS-data-set ; RUN;**
178
**Lesson 10: Combining SAS Data Sets Vertically** Concatenating SAS Data Sets
When you specify multiple data sets in the SET statement, SAS combines them into a single data set. In the combined data set, the observations appear in the order in which the data sets are listed in the SET statement. You can add additional DATA step statements, such as an assignment statement to create new variables in the output data set. Because the DATA step creates a new data set, the input data sets can contain different variables. If the data sets specified in the SET statement have a variable with the same name but different types, SAS generates a compile-time error by default. **DATA SAS-data-set; SET SAS-data-set1 SAS-data-set2 …; RUN;**
179
**Lesson 10: Combining SAS Data Sets Vertically** Concatenating SAS Data Sets RENAME=
You can use the RENAME= option to change the name of one or more variables. If the RENAME= option is associated with an input data set in the SET statement, it renames one or more variables being read from that data set. However, the RENAME= option does not rename variables in the input data set. Instead, the RENAME= option tells SAS which slot in the PDV to use when SAS is building observations from the new data set. You can use multiple RENAME= options in one SET statement if you want to rename variables from multiple data sets. **SAS-data-set (RENAME=(old-name-1 = new-name-1 old-name-2 = new-name-2 … old-name-n = new-name-n))**
180
**Lesson 10: Combining SAS Data Sets Vertically** Interleaving SAS Data Sets
Use a DATA step with a BY statement. The BY statement specifies one or more BY variables. SAS uses the values of the BY variables to arrange the observations in the output data set. Any BY variable that you specify must be common to all input data sets. By default, SAS arranges observations in ascending order unless you specify the DESCENDING option. Before you run a DATA step that contains a BY statement, your input data sets must be sorted on the BY variables. SAS outputs observations with duplicate BY values in the order in which the data sets are listed in the SET statement. If duplicate values are in the same data set, SAS outputs those observations in the order in which they appear in the data set. You could also interleave data sets by concatenating them first, and then sorting the output file. DATA SAS-data-set; SET SAS-data-set1 SAS-data-set2 …; BY BY-variable(s); RUN;
181
**Lesson 10: Combining SAS Data Sets Vertically** Sample Code Appending Data Sets
proc append base=emps data=emps2010 force; run;
182
**Lesson 10: Combining SAS Data Sets Vertically** Sample Code Concatenating Data Sets
data empsall2; set empscn empsjp(rename=(Region=Country)); run;
183
**Lesson 10: Combining SAS Data Sets Vertically** Sample Code Interleaving Data Sets
data empsname; set empscn empsjp(rename=(Region=Country)); by First; run;
184
**Lesson 11: Combining SAS Data Sets Horizontally** Combining SAS Data Sets Horizontally
When you combine data sets horizontally, you combine the observations from multiple data sets into a single observation in a new data set. To combine data sets horizontally, it’s important to understand the relationship between the input data sets. The relationships between input data sets can be one-to-one, one-to-many, many-to-one, many-to-many, or non-matching.
185
**Lesson 11: Combining SAS Data Sets Horizontally** Combining SAS Data Sets Horizontally Merging
Combines observations from two or more data sets into a single observation in a new data set.
186
**Lesson 11: Combining SAS Data Sets Horizontally** Combining SAS Data Sets Horizontally Match-merging
Merging that is based on the values of one or more common variables.
187
**Lesson 11: Combining SAS Data Sets Horizontally** Merging SAS Data Sets One to One
To match-merge using the DATA step, you use the MERGE and BY statements. The MERGE statement joins observations from two or more SAS data sets into single observations. The BY variables must be common to all input data sets, and the input data sets must be sorted on the BY variables. If you specify only one data set in the MERGE statement, SAS treats the MERGE statement like a SET statement. When you match-merge data sets that have a one-to-one relationship, the output data set has the same number of observations as the input data sets. **DATA SAS-data-set;** ** MERGE SAS-data-set1 SAS-data-set2 … ;** ** BY BY-variable(s);** ** ** **RUN;**
188
**Lesson 11: Combining SAS Data Sets Horizontally** Merging SAS Data Sets One to Many
To match-merge data sets that have one-to-many or many-to-one relationships, you use a DATA step with the MERGE and BY statements. T he DATA step is the same as the one you use to match-merge data sets that have a one-to-one relationship. A BY group is a group of observations that have the same value of the BY variable. When the input data sets have a one-to-many or many-to-one relationship, each BY group might contain more than one observation. The output data set contains all of the observations for each BY group.
189
**Lesson 11: Combining SAS Data Sets Horizontally** Merging SAS Data Sets That Have Non-Matches
Use the DATA step with the MERGE and BY statements. The DATA step is the same as the one you use for match-merging data sets with other relationships. The output data set contains both matches (observations that contain data from both input data sets) and non-matches (observations that contain data from only one input data set). When you specify the IN= data set option after an input data set in the MERGE statement, SAS creates a temporary variable that indicates whether the data set contributed data to each output observation. You can specify the IN= option for multiple input data sets in one MERGE statement. To create an output data set that contains only matches or only non-matches, you can use the IN= data set option to identify which input data sets contributed data to each observation that SAS outputs. Then, you can use the subsetting IF statement to output only those observations that contain data from all the input data sets. **SAS-data-set (IN=variable)**
190
**Lesson 11: Combining SAS Data Sets Horizontally** Merging SAS Data Sets Many to Many
You can match-merge data sets that have a many-to-many relationship in two ways: using the DATA step techniques that you’ve already learned, or using the SQL procedure. When you use a DATA step with MERGE and BY statements to match-merge data sets that have a many-to-many relationship, the output data set does not contain all possible combinations. When you use PROC SQL to join data sets that have a many-to-many relationship, the output data set does contain all possible combinations of matching observations.
191
**Lesson 11: Combining SAS Data Sets Horizontally** Sample Code Match-Merging Data Sets
data empsauc; merge empsau phonec; by EmpID; run;
192
**Lesson 11: Combining SAS Data Sets Horizontally** Sample Code Match-Merging Data Sets and Selecting Observations by Which Data Set Contributed
data empsauc; merge empsau(in=Emps) phonec(in=Cell); by EmpID; if Emps=0 or Cell=0; run;
193
**Lesson 12: Enhancing Reports** Setting SAS System Options for Procedure Output
SAS system options are instructions that affect your SAS session. All SAS system options have default settings that SAS uses unless you specify otherwise. When you create a report, the appearance of the output is determined, in part, by the settings of SAS system options. You can change the settings of SAS system options to alter the way your reports look. Many of the SAS system options for procedure output affect only listing output—not HTML output.
194
**Lesson 12: Enhancing Reports** Enhancing Reports Option Statement
Global statement to modify SAS system options. Changed settings of SAS system options remain in effect until you change the settings again or until you end your SAS session. **OPTIONS option(s);**
195
**Lesson 12: Enhancing Reports** Options Statement DATE or NODATE
Specifies whether or not the date and time appear at the top of your listing output.
196
**Lesson 12: Enhancing Reports** Options Statement NUMBER or NONUMBER
Specifies whether or not the page number appears at the top of listing output.
197
**Lesson 12: Enhancing Reports** Options Statement PAGESIZE=
Specifies the number of lines in a page of listing output. The PAGESIZE = option has the alias PS=.
198
**Lesson 12: Enhancing Reports** Options Statement LINESIZE=
Specifies the number of characters in a line. The LINESIZE= option has the alias LS=.
199
**Lesson 12: Enhancing Reports** Options Statement CENTER or NOCENTER
Specifies whether the output is centered or left-aligned.
200
**Lesson 12: Enhancing Reports** Options Statement DTRESET or NODTRESET
Whether or not SAS updates the date and time on every report you create in the same SAS session.
201
**Lesson 12: Enhancing Reports** Options Statement PAGENO=
Specifies the beginning page number for the next page of output.
202
**Lesson 12: Enhancing Reports** Specifying Titles and Footnotes
By default, SAS adds a standard title to reports. You can use the TITLE/FOOTNOTE statements to specify a title/footnote other than the standard SAS title/no footnote. They are global statements You can specify up to 10 titles and 10 footnotes. Titles and footnotes appear in both listing output and HTML output. N can be 1 to 10 and indicates the line on which the title or footnote appears. Defaults is line 1. **TITLE 'text'; FOOTNOTE 'text';**
203
**Lesson 12: Enhancing Reports** Canceling Titles and Footnotes
Titles and footnotes that you assign remain in effect until you change them, cancel them, or end your SAS session. To cancel a title or footnote, you issue a TITLE or FOOTNOTE statement with the same number as the existing title or footnote and no text; this also cancels any higher-numbered titles or footnotes. To cancel all previously defined titles or footnotes, you can specify a null TITLE or FOOTNOTE statement, which has no number and no text.
204
**Lesson 12: Enhancing Reports** Adding Labels to Reports
You can change the appearance of variable names in your reports by specifying temporary labels with the LABEL statement in your PROC steps. When you use a LABEL statement within a PROC step, the label applies only to the PROC step that it’s part of. Remember that to display labels in PROC PRINT reports, you must use the LABEL option in the PROC PRINT statement You can assign a temporary label to a variable that already has a permanent label associated with it. The temporary label always overrides the permanent label in reports.
205
**Lesson 12: Enhancing Reports** Adding Formats to Reports
Control how values appear in reports by specifying temporary SAS formats with the FORMAT statement in your PROC steps. Assign a temporary format to a variable that already has a permanent format associated with it. In reports, the temporary format always overrides the permanent format.
206
**Lesson 12: Enhancing Reports** Split Option
Another way to tell SAS to display labels is to add the SPLIT= option in the PROC PRINT statement. The SPLIT= option specifies a split-character that indicates where to wrap the label in the report. **SPLIT='split-character'**
207
**Lesson 12: Enhancing Reports** Creating and Applying User-Defined Formats
Use the PROC FORMAT step to create your own formats. You can define only one format in a VALUE statement, but you can include multiple VALUE statements in one PROC FORMAT step. The names that you specify for your formats must follow the rules for SAS names. Do not put a period after the format name in the VALUE statement, but you do put a period after the format name in the FORMAT statement. You apply your formats in a reporting PROC step, such as a PROC PRINT step, by using a FORMAT statement. You can specify both user-defined formats and SAS formats in the same FORMAT statement. **PROC FORMAT; VALUE format-name value-or-range1='formatted-value1' value-or-range2='formatted-value2' …; RUN;**
208
**Lesson 12: Enhancing Reports** Creating and Applying User-Defined Formats HIGH + LOW
You can use the keyword HIGH to specify the highest possible value of a variable. You can use the keyword LOW to specify the lowest possible value of a variable. For character variables, the LOW keyword treats missing values as the lowest possible values. For numeric values, LOW does not include missing values.
209
**Lesson 12: Enhancing Reports** Subsetting Observations
Selecting only some of the observations Use the WHERE statement in a PROC step to subset observations in a report. When you use the WHERE statement, your output contains only the observations that meet the conditions specified in the WHERE expression.
210
**Lesson 12: Enhancing Reports** Grouping Observations
Grouping observations by the values of one or more variables. You can use the BY statement in reporting procedures to group observations in reports. In a PROC step, the BY statement produces separate sections of the report for each BY group.
211
**Lesson 12: Enhancing Reports** Sample Code Setting System Options and Specifying Titles and Footnotes
options dtreset pageno=1; title 'Orion Star Sales Salaries'; footnote3 'Confidential'; proc print data=orion.sales; var Employee\_ID First\_Name Last\_Name Salary; run; title; footnote;
212
**Lesson 12: Enhancing Reports** Sample Code Specifying Temporary Labels and Formats
proc print data=orion.sales split='\*'; var Employee\_ID Job\_Title Salary; label Employee\_ID='Sales ID' Job\_Title='Job Title' Salary='Annual\*Salary'; format Salary dollar10.0; run;
213
**Lesson 12: Enhancing Reports** Sample Code Creating and Applying User-Defined Formats
proc format; value $ctryfmt 'AU' = 'Australia' 'US' = 'United States' other= 'Miscoded'; value tiers low-\<50000 = 'Tier 1' 50000-100000 = 'Tier 2' 100000-high= 'Tier 3'; run; proc print data=orion.sales label; var Employee\_ID Salary Country Hire\_Date; label Employee\_ID='Sales ID' Hire\_Date='Hire Date'; format Salary dollar10.0 Hire\_Date monyy7. Country $ctryfmt. Salary tiers.; run;
214
**Lesson 12: Enhancing Reports** Sample Code Subsetting Observations in a Report
proc print data=orion.sales; var Last\_Name Job\_Title Country Salary; format Salary dollar10.0 where Salary\>75000; title 'Sales Salaries Over $75,000'; run; title;
215
**Lesson 12: Enhancing Reports** Sample Code Grouping Observations in a Report
proc print data=orion.sales; by Country descending Gender; run;
216
**Lesson 12: Enhancing Reports** Review: LABEL Statement
Assign descriptive labels to one or more variables. Element 1: variablen which specifies the name of the variable that you want to label. Element 1: 'labeln' which specifies a label of up to 256 characters, including blanks. Enclose the label in quotation marks. If the LABEL statement appears in a PROC step, it is temporary and applies only to the PROC step in which it appears. **LABEL variable1='label1' ;**
217
**Lesson 12: Enhancing Reports** Review: FORMAT Statement
A format is an instruction that SAS uses to display data values. To make data values more understandable when they are displayed in your procedure output, you can use the FORMAT statement to associate formats with variables. Formats affect only how the values appear in the output, not the actual data values as they are stored in the SAS data set. The FORMAT statement has the following elements: Element 1: variable(s) - specifies one or more variables for SAS to associate with a format. Element 2: format specifies a SAS format or a userdefined format that is used to write the values of the variables. You can associate multiple formats with variables in the same FORMAT statement. If the FORMAT statement appears in a PROC step, it is temporary and applies only to the PROC step in which it appears. **FORMAT variable(s) format;**
218
**Lesson 12: Enhancing Reports** Review: The WHERE Statement
Select observations from SAS data sets that meet a particular condition. Specifies a condition for selecting observations and can be any valid SAS expression. An expression is a sequence of operands and operators that form a set of instructions. A whereexpression must follow several rules: Rule 1: Can include variables, constants, or both. Rule 2: Variables in the whereexpression must exist in the input data set. Rule 3: Character constants must be enclosed in quotation marks. Rule 4: Numeric constants cannot be enclosed in quotation marks. **WHERE whereexpression;**
219
**Lesson 12: Enhancing Reports** Review: BY Statement
You can use the BY statement to specify variables that are used for grouping. The BY statement has the following elements: Element 1: BYvariable(s) specifies one or more variables that the procedure uses to form BY groups. Separate multiple variables by using blanks. Element 2: the DESCENDING keyword specifies that the observations be sorted in descending order by the specified variable(s) **BY BYvariable(s);**
220
**Lesson 13: Using the Output Delivery System to Create External Files** Opening and Closing ODS Destinations
The Output Delivery System (ODS) gives you greater flexibility in generating, storing, and reproducing the output from SAS procedures and the DATA step. Each file format that you can create has a corresponding ODS destination. To create a certain type of output, you open the destination that is associated with that output type. When you are finished using a destination, you can close it again. The default ODS destination is the LISTING destination. Another ODS destination is the HTML destination. SAS provides many other ODS destinations such as PDF, RTF, and several destinations that create files that you can open in Excel: CSVALL, MSOFFICE2K, and EXCELXP. You use the ODS statement to open the destination for each type of formatted output you want to create. To close a destination, you add the keyword CLOSE to the ODS statement, after the destination. If you do not close a destination, it remains open until the end of the SAS session. **ODS destination; ODS destination CLOSE;**
221
**Lesson 13: Using the Output Delivery System to Create External Files** Using ODS to Create HTML, PDF, and RTF Files
You can view HTML files in the Results Viewer in SAS, or in Web browsers. You can view and work with PDF files in Adobe software products. You can view and work with RTF files in word processors. You use the ODS statement to specify the file that you want to create. You can open one destination at a time, or you can open multiple destinations at once by using several ODS statements. You can close all destinations in a single ODS statement by adding the \_ALL\_ keyword instead of a keyword for a specific destination. However, if you close all of the destinations, SAS cannot display output for any procedures. **ODS destination FILE='file-specification'; ODS \_ALL\_ CLOSE;**
222
**Lesson 13: Using the Output Delivery System to Create External Files** Applying Style Definitions to External Files
A style definition describes how to display the presentation aspects of SAS output such as colors and fonts. By default, the HTML destination uses the Default style definition, the PDF destination uses the Printer style destination, and the RTF destination uses the RTF style definition. You cannot apply style definitions to listing output. To specify a style definition for the output, you can add the STYLE= option to your ODS statement. **ODS destination FILE='file-specification' STYLE=style-definition;**
223
**Lesson 13: Using the Output Delivery System to Create External Files** CSVALL
Creates a CSV file that you can view in the SAS Program Editor window or another text editor as well as in Excel.
224
**Lesson 13: Using the Output Delivery System to Create External Files** MSOFFICE2K
Creates an HTML file that you can view in a Web browser, Microsoft Word, or Microsoft Excel.
225
**Lesson 13: Using the Output Delivery System to Create External Files** EXCELXP
Creates an XML file that you can open in Microsoft Excel
226
**Lesson 13: Using the Output Delivery System to Create External Files** Sample Code
ods html file='my-file-path\myreport.html' style=sasweb; ods pdf file='my-file-path\myreport.pdf'; ods rtf file='my-file-path\myreport.rtf'; ods csvall file='my-file-path\myexcel.csv'; proc freq data=orion.sales; tables Country; run; ods close \_all\_; ods listing;
227
**Lesson 14: Creating Summary Reports and Data Sets** Using PROC FREQ to Create Summary Reports One-way frequency Tables
Four statistics for each value, by default: frequency, percent, cumulative freqency, and cumulative percent.
228
**Lesson 14: Creating Summary Reports and Data Sets** Using PROC FREQ to Create Summary Reports Crosstabulation Tables
Summarize data for two or more categorical variables by showing the number of observations for each combination of variable values. Crosstabulation tables are also called contingency tables. The simplest crosstabulation table is a two-way frequency table. Four statistics appear by default: frequency, percent, row percentage, and column percentage. Row/observation is listed on the side Column/Variable is listed on top
229
**Lesson 14: Creating Summary Reports and Data Sets** Using PROC FREQ to Create Summary Reports N-way crosstabulations
Produces a series of two-way tables, with a separate table for each level of the other variable To create an n-way crosstabulation table for two or more variables, you specify an asterisk instead of a space between the variable names in the TABLES statement. The first variable specifies the table rows and the second variable specifies the table columns.
230
**Lesson 14: Creating Summary Reports and Data Sets** Using PROC FREQ to Create Summary Reports
When you use PROC FREQ to create summary reports, it’s more useful to display the distribution of values for some variables than for others. Frequency distributions work best with variables whose values are categorical and are best summarized by counts instead of averages. A PROC FREQ report contains a one-way frequency table for each variable that you specify in the TABLES statement. You can use other statements such as the WHERE statement, the LABEL statement, and the FORMAT statement, within a PROC FREQ step. Also add options to your PROC FREQ statement, such as the NLEVELS option, the PAGE option, or the COMPRESS option. Options that you can specify in the TABLES statement include the following: NOPRINT, NOCUM, NOPERCENT, NOFREQ, NOROW, NOCOL, LIST, CROSSLIST, and FORMAT=. ``` **PROC FREQ DATA=SAS-data-set ; TABLES variable(s) ; ``` RUN;**
231
**Lesson 14: Creating Summary Reports and Data Sets** Using PROC FREQ to Create Summary Data Sets OUT= option with Tables Statement
To create a data set that contains frequencies and percentages, you add the OUT= option to the TABLES statement. The OUT= option names an output data set that contains the frequency counts and percentages for a frequency table or crosstabulation table. If the TABLES statement specifies variables for more than one table, the contents of the OUT= data set correspond to the last table specified. The output data set named in the OUT= option contains two variables that PROC FREQ creates automatically: COUNT and PERCENT. You can also specify additional options in the TABLES statement to include other frequency and percentage variables in the output, such as the OUTCUM or OUTPCT options. When you specify the OUT= option in the TABLES statement, PROC FREQ still creates a PROC FREQ report by default, unless you include the NOPRINT option in the PROC FREQ statement. **TABLES variable(s) / OUT=SAS-data-set );**
232
**Lesson 14: Creating Summary Reports and Data Sets** Using PROC FREQ to Create Summary Data Sets OUT= option with Proc Freq Step
Another way to create output data sets is to add an OUTPUT statement that contains the OUT= option to your PROC FREQ step. There are many options available for specifying statistics in the OUTPUT statement. If your TABLES statement specifies multiple tables, the contents of the OUTPUT data set correspond to the last table. If you use multiple TABLES statements, the contents of the OUTPUT data set correspond to the last TABLES statement. To create multiple output data sets, you can add the OUT= option to both the TABLES statement and the OUTPUT statement in a PROC FREQ step. **OUTPUT OUT=SAS-data-set ;**
233
**Lesson 14: Creating Summary Reports and Data Sets** Using PROC MEANS to Create Summary Reports
Computes descriptive statistics for continuous numeric variables across all observations and within groups of observations To calculate statistics for groups of observations use the CLASS statement. The CLASS statement specifies variables that PROC MEANS uses to group the data. The variables listed in the CLASS statement are also called classification variables or class variables. Each combination of class variable values is called a class level. You can include statistic keywords in the PROC MEANS statement to control the statistics that PROC MEANS displays. When you specify one or more statistics in the PROC MEANS statement, PROC MEANS does not produce the default statistics. You can use other options in the PROC MEANS statement, such as NONOBS, MAXDEC=, and FW=. You can use additional statements such as the WHERE statement, the FORMAT statement, and the TITLE statement. PROC MEANS DATA=SAS-data-set ; VAR analysis-variable(s); CLASS classification-variable(s); RUN;
234
**Lesson 14: Creating Summary Reports and Data Sets** Class Statement
Use to calculate statistics for groups of observations Specifies variables that PROC MEANS uses to group the data The variables listed also called classification variables or class variables. Each combination of class variable values is called a class level.
235
**Lesson 14: Creating Summary Reports and Data Sets** Using PROC SUMMARY to Create Summary Reports
Provides data summarization tools to compute descriptive statistics for variables across all observations and within groups of observations. By default PROC SUMMARY does not produce a report If you omit the VAR statement from a PROC SUMMARY step, PROC SUMMARY produces a simple count of observations. PROC SUMMARY DATA=SAS-data-set ; VAR analysis-variable(s); CLASS classification-variable(s); RUN;
236
**Lesson 14: Creating Summary Reports and Data Sets** Using PROC MEANS to Create Summary Data Sets
To create an output data set add an OUTPUT statement that contains the OUT= option to the PROC MEANS step. ``` The output data set that PROC MEANS creates contains several types of variables, including any BY variables, class variables, and the automatic variables \_TYPE\_ and \_FREQ\_. ``` Suppress the display of the PROC MEANS report by using the NOPRINT option in the PROC MEANS step. Restrict your output data set to only the observations that have the highest \_TYPE\_ value by using the NWAY option in the PROC MEANS statement. You can sort the output data set by \_TYPE\_ values in descending order by using the DESCENDTYPES option in the PROC MEANS statement. You can use the CHARTYPE option to specify that the values of \_TYPE\_ be character values. **OUTPUT OUT=SAS-data-set ;**
237
**Lesson 14: Creating Summary Reports and Data Sets** Using PROC TABULATE to Create Summary Reports
PROC TABULATE displays descriptive statistics in tabular format by using some or all of the variables in a data set. Reports contain one-, two-, or threedimensional tables that contain variables and statistics. Must specify at least one class variable or one analysis variable. You can also specify both. A class variable can be character or numeric and it usually has only a few distinct values. An analysis variable must be numeric and it often contains continuous values. Four statements are commonly used: #1 PROC TABULATE; #2 CLASS statement, #3 VAR statement, #4 TABLE statement. The TABLE statement contains from one to three dimension expressions that define a table The number of dimension expressions that you specify determines the dimensionality of the table in the report. You can use multiple TABLE statements in one PROC TABULATE step. Note that the TABLE statement is different from the TABLES statement in a PROC FREQ step. You can use various operators in a dimension expression. An asterisk creates categories from the combination of class variable values, which is called crossing. Blank places the output for the next element immediately after the output for the prior, which is called concatenation. Displays different statistics in a table by default, depending on whether the table contains any analysis variables. In a dimension expression, you can specify statistics instead of or in addition to the default statistics. You use the asterisk operator to attach one or more statistic keywords to a variable or combination of variables. To summarize all categories, you can specify the special class variable ALL to dimension expressions in a TABLE statement. **PROC TABULATE DATA=SAS-data-set ; CLASS classification-variable(s); VAR analysis-variable(s); TABLE page-expression, row-expression, column-expression ; RUN;**
238
**Lesson 14: Creating Summary Reports and Data Sets** Using PROC TABULATE to Create Summary Data Sets
Create an output data set that contains summary statistics by using the OUT= option in the PROC TABULATE statement. The OUT= option creates a single data set regardless of the number of TABLE statements in the step. The output data set that PROC TABULATE creates contains any BY variables; any class variables; the automatic variables \_TYPE\_, \_PAGE\_, and \_TABLE\_; and statistics. **PROC TABULATE DATA=SAS-data-set OUT=SAS-data-set ;**
239
**Lesson 14: Creating Summary Reports and Data Sets** Sample Code Using PROC FREQ to Create Summary Reports
proc format; value $ctryfmt 'AU'=;'Australia' 'US'='United States'; ods rtf file='my-file-path\myprocfreq.rtf' style=sasweb; proc freq data=orion.sales; tables Gender\*Country; where Job\_Title contains 'Rep'; format Country $ctryfmt.; title 'Sales Rep Frequency Report'; run; ods rtf close;
240
**Lesson 14: Creating Summary Reports and Data Sets** Sample Code Using PROC FREQ to Create Summary Reports
proc freq data=orion.sales nlevels; tables Gender Country Hire\_Date / noprint; format Hire\_Date date10.; run;
241
**Lesson 14: Creating Summary Reports and Data Sets** Sample Code Using PROC FREQ to Create Summary Data Sets
proc freq data=orion.sales noprint; tables Gender Country / out=freq1; run; proc freq data=orion.sales; tables Gender\*Country / out=freq2 outpct; run;
242
**Lesson 14: Creating Summary Reports and Data Sets** Sample Code Using PROC FREQ to Create Summary Data Sets
proc freq data=orion.sales; tables Gender Country / chisq out=freq6 outcum; output out=freq5 chisq; run;
243
**Lesson 14: Creating Summary Reports and Data Sets** Sample Code Using PROC MEANS to Create Summary Reports
``` proc means data=orion.sales sum mean range maxdec=0; var Salary; class Gender Country; run; ```
244
**Lesson 14: Creating Summary Reports and Data Sets** Sample Code Using PROC MEANS to Create Summary Data Sets
``` proc means data=orion.sales noprint; var Salary; class Gender Country; output out=means1; run; ``` proc print data=means1; run;
245
**Lesson 14: Creating Summary Reports and Data Sets** Sample Code Using PROC TABULATE to Create Summary Reports
``` proc tabulate data=orion.sales; class Job\_Title Gender Country; var Salary; table Country; table Job\_Title, Gender, Country; table Country\*Salary; run; ```
246
**Lesson 14: Creating Summary Reports and Data Sets** Sample Code Using PROC TABULATE to Create Summary Data Sets
``` proc tabulate data=orion.sales out=tabulate1; class Job\_Title Gender Country; var Salary; table Country; table Job\_Title, Gender, Country; table Country\*Salary; run; ```
247
**Lesson 14: Creating Summary Reports and Data Sets** Adding Elements to Enhance PROC FREQ Reports
248
**Lesson 14: Creating Summary Reports and Data Sets** Adding Elements to Enhance PROC MEANS Reports
249
**Lesson 14: Creating Summary Reports and Data Sets** The \_TYPE\_ Variable
250
**Lesson 14: Creating Summary Reports and Data Sets** Specifying Options or Output Data Set
251
**Lesson 14: Creating Summary Reports and Data Sets** Using PROC TABULATE to Create Summary Reports Defining Dimmensions in the TABLE STATEMENT One-Dimmensional
252
**Lesson 14: Creating Summary Reports and Data Sets** Using PROC TABULATE to Create Summary Reports Defining Dimmensions in the TABLE STATEMENT Two-Dimmensional
253
**Lesson 14: Creating Summary Reports and Data Sets** Using PROC TABULATE to Create Summary Reports Defining Dimmensions in the TABLE STATEMENT Three-Dimmensional
254
**Lesson 14: Creating Summary Reports and Data Sets** Using PROC TABULATE to Create Summary Reports Defining Dimmensions in the TABLE STATEMENT Specifying Statistics
255
**Lesson 14: Creating Summary Reports and Data Sets** Sample Code Combining Multiple PROC FREQ Data Sets
proc freq data=orion.sales; tables Gender / chisq out=genderfreq outcum; output out=genderchi chisq; run; proc freq data=orion.sales; tables Country / chisq out=countryfreq outcum; output out=countrychi chisq; run; data allfreq; length Value $ 7; set genderfreq(in=gf) countryfreq(in=cf) genderchi(in=gc) countrychi(in=cc); if gf then Value=Gender; else if cf then Value=Country; else if gc then Value='Gender'; else if cc then Value='Country'; label \_PCHI\_='ChiSquare' P\_PCHI='PValue'; keep Value COUNT PERCENT \_PCHI\_ P\_PCHI; run; proc print data=allfreq label; run;
256
**Lesson 14: Creating Summary Reports and Data Sets** Sample Code Using Output Data Sets Created by PROC MEANS
``` proc means data=orion.sales noprint; var Salary; class Gender Country; output out=work.means mean=aveSalary; run; ``` data gender\_summary(keep=Gender aveSalary) country\_summary(keep=Country aveSalary); set work.means; if \_type\_=1 then output country\_summary; else if \_type\_=2 then output gender\_summary; run; proc sort data=orion.sales out=sort\_country; by Country; run; data detail\_country; merge sort\_country country\_summary(rename=(aveSalary=CountrySalary)); by Country; run; proc sort data=detail\_country out=sort\_gender; by Gender; run; data detail\_country\_gender; merge sort\_gender gender\_summary(rename=(aveSalary=GenderSalary)); by Gender; if Salary\>CountrySalary then CS='Above Average'; else CS='Below Average'; if Salary\>GenderSalary then GS='Above Average'; else GS='Below Average'; label CS='Comparison\*to Country\*Salary Average' GS='Comparison\*to Gender\*Salary Average'; run; proc sort data=detail\_country\_gender; by Employee\_ID; run; proc print data=detail\_country\_gender split='\*'; var First\_Name Last\_Name Salary CS GS; run;
257
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords CLM
twosided confidence limit for the mean
258
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords CSS
corrected sum of squares
259
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords CV
coefficient of variation
260
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords Kurtosis
Kurtosis
261
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords LCLM
onesided confidence limit below the mean
262
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords MAX
Max Value
263
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords Mean
Average
264
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords Min
Min Value
265
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords N
number of observations with nonmissing values
266
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords NMISS
number of observations with missing values
267
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords Range
Range
268
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords Skewness
Skewness
269
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords STDDEV / STD
standard deviation
270
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords STDERR
standard error of the mean
271
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords SUM
Sum
272
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords SUMWGT
sum of the weight variable values
273
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords UCLM
onesided confidence limit above the mean
274
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords USS
uncorrected sum of squares
275
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords VAR
Variance
276
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords Quantile Stat: MEDIAN / P50
median or 50th percentile
277
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords Quantile Stat: P1
1st percentile
278
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords Quantile Stat: P5
5th percentile
279
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords Quantile Stat: P10
10th percentile
280
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords Quantile Stat: Q1 / P25
lower quartile or 25th percentile
281
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords Quantile Stat: Q3 / P75
upper quartile or 75th percentile
282
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords Quantile Stat: P90
90th Percentile
283
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords Quantile Stat: P95
95th Percentile
284
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords Quantile Stat: P99
99th Percentile
285
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords Quantile Stat: QRANGE
difference between upper and lower quartiles: Q3Q1
286
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords Hypothesis Testing: PROBT
probability of a greater absolute value for the t value
287
**Lesson 14: Creating Summary Reports and Data Sets** PROC MEANS Statistic Keywords Hypothesis Testing: T
Student's t for testing the hypothesis that the population mean is 0
288
**Lesson 15: Creating Graphs Using SAS/GRAPH Software** Overview of SAS/GRAPH Software
To produce a SAS/GRAPH report, you begin by specifying a GOPTIONS statement to set any global options that you want for your graphs. Then you add other global statements such as TITLE and FOOTNOTE statements. Next, you write the PROC step for the specific type of graph you want. To end the procedure, you use a QUIT statement
289
**Lesson 15: Creating Graphs Using SAS/GRAPH Software** Specifying Global Statements in SAS/GRAPH Programs
You use the GOPTIONS statement to control graphics options that apply to all graphs. To reset all graphics options, you specify RESET=ALL in the GOPTIONS statement. You can also set specific options; for example, you can set CBAK=WHITE to set the background color to white. You can use other global statements in your SAS/GRAPH programs, such as TITLE and FOOTNOTE statements. **GOPTIONS ;**
290
**Lesson 15: Creating Graphs Using SAS/GRAPH Software** Creating Bar and Pie Charts PROC GCHART
Specify the chart form such as VBAR, VBAR3D, HBAR, HBAR3D, PIE, or PIE3D. By default, the size of the bars or pie slices in a chart represents a frequency count of the values of the chart variable. PROC GCHART creates a bar for each midpoint, which is a value that represents either one data value of the middle of a range of data values. Specify options that defines the form of the bar or pie chart, such as the NOSTATS option, the AUTOREF option, the LEVELS= option, the RANGE option, and the DISCRETE option. You can use the TYPE= option to specify a statistic other than the default statistic FREQ. For bar charts, you can specify FREQ, CFREQ, PERCENT, or CPERCENT. You can add a LABEL statement to a PROC GCHART step to specify labels for the chart. You can also specify other statistics to display above or to the right of the bars, including SUM, FREQ, and PERCENT. Other options you can use with your bar charts include the PATTERNID=MIDPOINT option, the SUBGROUP= option, and the GROUP= option. When you create pie charts, you can also use options such as SUMVAR=, TYPE=, GROUP= and SUBGROUP=. PROC GCHART DATA=SAS-data-set; chart-form chart-variable ; RUN;
291
**Lesson 15: Creating Graphs Using SAS/GRAPH Software** Bat Chart SUMVAR
SUMVAR= option to summarize a variable for each unique value of the chart variable. The SUMVAR= option specifies the variable whose values control the height or length of the bars; this variable is known as the summary variable or the analysis variable. You can use the TYPE= and SUMVAR= options together, but in that case the value of TYPE= must be either SUM or MEAN.
292
**Lesson 15: Creating Graphs Using SAS/GRAPH Software** Creating Plots
You can use the GPLOT procedure to create plots of one variable plotted on the horizontal axis and a second varaible plotted on the vertical axis. PROC GPLOT can create single plots and overlay plots. You can use other statements such as the FORMAT statement and the LABEL statement within a PROC GPLOT step. The SYMBOL statement specifies plotting symbols, plot lines, and color. You can use multiple SYMBOL statements within one PROC GPLOT step. In the SYMBOL statement, you can use the options C=, I=, CV=, and CI=. To overlay plots on the same set of axes, you use a PLOT statement that specifies two separate plot requests, and you use the OVERLAY option. You can use options in the PLOT statement such as the VREF= option, the CFRAME= option, and the HAXIS= and VAXIS= **PROC GPLOT DATA=SAS-data-set; PLOT vertical-variable\*horizontal-variable ; SYMBOL\<1…255\> ; RUN;**
293
**Lesson 15: Creating Graphs Using SAS/GRAPH Software** Enhancing SAS/GRAPH Output
Can apply styles to your charts. Remember that you use the STYLE= option in the ODS statement to specify an ODS style. You can add the FTEXT=, CTEXT=, and HTEXT= options to the GOPTIONS statement to control the appearance of all text You can add options to other global statements such as the TITLE statement and FOOTNOTE statement. You can use the FONT=, COLOR=, HEIGHT=, or JUSTIFY= options. **ODS destination STYLE=style-definition;**
294
**Lesson 15: Creating Graphs Using SAS/GRAPH Software** Sample Code Creating Bar and Pie Charts
goptions reset=all border; title 'Employees by Job Title'; proc gchart data=orion.staff; vbar3d Job\_Title / sumvar=salary type=mean mean; where Job\_Title=:'Sales Rep'; label Salary='Average Salary' Job\_Title='Job Title'; run; quit;
295
**Lesson 15: Creating Graphs Using SAS/GRAPH Software** Sample Code Creating Plots
proc gplot data=orion.budget; plot Yr2006\*Month yr2007\*Month/ overlay vref=3000000 cframe='#FFFFE0' vaxis=1000000 to 5000000 by 1000000; haxis=1 to 12; format Yr2006 dollar12.; label Yr2006='Budget'; symbol1 i=join v=dot ci=blue cv=blue; symbol2 i=join v=triangle ci=red cv=red; run;
296
**Lesson 15: Creating Graphs Using SAS/GRAPH Software** Sample Code Enhancing SAS/GRAPH Output
ods html style=gears; goptions reset=all ftext=arial ctext=dark\_blue htext=6 pct; proc gchart data=orion.staff; pie3d Job\_Title; where Job\_Title=:'Sales Rep'; label Gender='Gender' Job\_Title='Job Title'; title f=centbi h=5 pct 'Budget by Month'; footnote c=green j=left 'Data for 2007'; run;
297
**Lesson 15: Creating Graphs Using SAS/GRAPH Software** Sample Code Using RUN-Group Processing
proc gchart data=orion.staff; vbar Job\_Title; title 'Bar Chart of Job Titles'; run; pie3d Job\_Title; title 'Pie Chart of Job Titles'; run; proc gplot data=orion.budget; plot Yr2006\*Month; symbol1 i=join v=dot ci=blue cv=blue; symbol2 i=join v=triangle ci=red cv=red; run; quit;
298
**Lesson 15: Creating Graphs Using SAS/GRAPH Software** Enhancing the Bar Chart Example 1
299
**Lesson 15: Creating Graphs Using SAS/GRAPH Software** Enhancing the Bar Chart Example 2
300
**Lesson 15: Creating Graphs Using SAS/GRAPH Software** Discrete Variable
A finite number of specific numeric values. For example, a variable that contains years, such as 1984 or 2001, is a discrete variable.
301
**Lesson 15: Creating Graphs Using SAS/GRAPH Software** Continuous Variable
A range of numeric values. For example, a variable of temperature data that contains real values between 0 and 212 is a continuous variable
302
**Lesson 15: Creating Graphs Using SAS/GRAPH Software** Using Symbol Statements
303
**Lesson 15: Creating Graphs Using SAS/GRAPH Software** Enhancing Plots
304
**Lesson 15: Creating Graphs Using SAS/GRAPH Software** Appearance Options in the GOPTIONS Staement
305
**Lesson 15: Creating Graphs Using SAS/GRAPH Software** RUN-Group Processing
306
**Lesson 1: Controlling Input and Output** Data Manipulation Techniques Outputting Multiple Observations
You can control when SAS writes an observation to a SAS data set by using an explicit OUTPUT statement in your code. After you use an explicit OUTPUT statement, there is no implicit OUTPUT statement at the bottom of the DATA step. OUTPUT ;
307
**Lesson 1: Controlling Input and Output** Data Manipulation Techniques Writing to Multiple SAS Data Sets
To create more than one data set, you specify the names of the SAS data sets you want to create in the DATA statement. Separate SAS data set names with a space. You can use OUTPUT statements with conditional logic to create multiple data sets that contain observations based on the value of a variable in the input data set. **DATA SAS-data-set-name SAS-data-set-name-n;**
308
**Lesson 1: Controlling Input and Output** Data Manipulation Techniques Writing to Multiple SAS Data Sets SELECT Statement
Use for conditional processing in a DATA step. The optional SELECT expression specifies any SAS expression that evaluates to a single value. Often a variable name is used as the SELECT expression. The WHEN statement begins with the keyword WHEN followed by at least one WHEN expression. The WHEN expression can be any SAS expression, including a compound expression. The WHEN expression is followed by a statement, which can be any executable SAS statement. The optional OTHERWISE statement specifies a statement to be executed if no WHEN condition is met. The keyword END signals the end of the select group. When a SELECT expression is specified, SAS evaluates the SELECT expression and compares that value to each WHEN expression and returns either a true or false. If no SELECT expression is specified, SAS evaluates each WHEN expression in order until it finds a true expression. If no WHEN expression is true, SAS executes the OTHERWISE statement if one is present. You can use functions in a SELECT expression. You can use DO-END groups in a SELECT group. You can execute multiple statements when a WHEN expression is true by using DO-END groups. SELECT \<(select-expression)\>; WHEN-1 (when-expression -1 \<…, when-expression-n\>) statement; WHEN-n (when-expression -1 \<…, when-expression-n\>) statement; END;
309
**Lesson 1: Controlling Input and Output** Data Manipulation Techniques Controlling Variable Input and Output
The DROP= and KEEP= data set options can be used to specify variables to drop or keep in the output data. When used in the SET statement, the variables are not processed and are not available in the program data vector. When used in the DATA statement, they affect the variables in the output data set they are associated with. Affect all output data sets listed in the DATA statement. You can use a combination of data set options and statements. If you use them together, statements are applied before data set options. If you attempt to drop and keep the same variable, you will get a warning. **SAS-data-set-name (DROP=variable(s))** **SAS-data-set-name (KEEP=variable(s))**
310
**Lesson 1: Controlling Input and Output** Data Manipulation Techniques Controlling Observation Input and Output OBS=
specifies the number of the last observation to process. It does not specify how many observations should be processed SAS-data-set-name (OBS=n)
311
**Lesson 1: Controlling Input and Output** Data Manipulation Techniques Controlling Observation Input and Output FIRSTOBS=
specifies a starting point for processing an input data set. By default, FIRSTOBS=1 **SAS-data-set-name (FIRSTOBS=n)**
312
**Lesson 1: Controlling Input and Output** Data Manipulation Techniques Controlling Observation Input and Output OBS= + FIRSTOBS=
Use the OBS= and FIRSTOBS= data set options to limit the number of observations that SAS processes. can use FIRSTOBS= and OBS= together to define a range of observations for SAS to process. FIRSTOBS= and OBS= can be used in a procedure step to limit the number of observations that are processed. If a WHERE statement is used to subset the observations, it is applied before the data set options
313
**Lesson 1: Controlling Input and Output** Data Manipulation Techniques Sample Code Outputting Multiple Observations
data forecast; set orion.growth; Year=1; Total\_Employees=Total\_Employees\*(1+Increase); output; Year=2; Total\_Employees=Total\_Employees\*(1+Increase); output; run;
314
**Lesson 1: Controlling Input and Output** Data Manipulation Techniques Sample Code Writing to Multiple SAS Data Sets (Using a SELECT Group)
data usa australia other; set orion.employee\_addresses; select (Country); when ('US') output usa; when ('AU') output australia; otherwise output other; end; run;
315
**Lesson 1: Controlling Input and Output** Data Manipulation Techniques Sample Code Writing to Multiple Data Sets (Using a SELECT Group with DO-END Group in the WHEN statement)
data usa australia other; set orion.employee\_addresses; select (upcase(Country)); when ('US') do; Benefits=1; output usa; end; when ('AU') do; Benefits=2; output australia; end; otherwise; Benefits=0; output other; end; end; run;
316
**Lesson 1: Controlling Input and Output** Data Manipulation Techniques Sample Code Controlling Variable Input and Output
data usa australia(drop=State) other; drop Country; set orion.employee\_addresses (drop=Employee\_ID); if Country='US' then output usa; else if Country='AU' then output australia; else output other; run;
317
**Lesson 1: Controlling Input and Output** Data Manipulation Techniques Sample Code Controlling Observation Input and Output
data australia; set orion.employee\_addresses (firstobs=50 obs=100); if Country='AU' then output; run; proc print data=orion.employee\_addresses (obs=10); where Country='AU'; var Employee\_Name City State Country; run;
318
**Lesson 1: Controlling Input and Output** Data Manipulation Techniques Review: IF-THEN/ELSE Statement
IF-THEN statement executes a SAS statement when the condition in the IF clause is true. The syntax for the IFTHEN statement: **IF expression THEN statement;** **data work.clean; set orion.nonsales; if Employee\_ID=120106 then Salary=26960; run;** SAS executes the assignment statement only when the condition (Employee\_ID=120106) is true. If the condition is false, SAS does not perform the action in the THEN statement Suppose you want to assign a value to Salary based on the other possible values of Employee\_ID. One way to do this is to add IF-THEN statements for the other two conditions, as shown below: **if Employee\_ID=120106 then Salary=26960;** **if Employee\_ID=120115 then Salary=26500;** **if Employee\_ID=120191 then Salary=24015;** However, when the DATA step executes, SAS evaluates each IF statement in order, even if the first condition is true. This wastes system resources and slows the processing of your program. Instead of using a series of IF-THEN statements, you can use the ELSE statment to specify an alternative action to be performed when the condition in an IF-THEN statement is false. As shown below, you can write multiple ELSE statements to specify a series of mutually exclusive conditions: **if Employee\_ID=120106 then Salary=26960;** **else if Employee\_ID=120115 then Salary=26500;** **else if Employee\_ID=120191 then Salary=24015;** The ELSE statement must immediately follow the IF-THEN statement in your program. An ELSE statement executes only if the previous IF-THEN/ELSE statement is false. Using IFTHEN statements without the ELSE statement causes SAS to evaluate all IFTHEN statements. Using IFTHEN statements with the ELSE statement causes SAS to execute IF-THEN statements until it encounters the first true statement. Subsequent ELSE statements are not evaluated.
319
**Lesson 1: Controlling Input and Output** Data Manipulation Techniques Using a SELECT Group for Conditional Processing
320
**Lesson 1: Controlling Input and Output** Data Manipulation Techniques Review: DO Groups
To specify multiple statements to be executed as a unit, you can use a DO group. Sequence of statements that starts with a simple DO statement and that ends with a corresponding END statement. **DO; executable statements END;** Suppose you want to list the amount and frequency of bonuses for employees of Orion Star Sports & Outdoors. Orion.Sales contains some information about all Orion Star employees, such as name and location, it does not contain bonus information. HR has told you that employees in the United States receive $500 once a year and employees in Australia receive $300 twice a year. The following program creates the temporary data set Bonus, which contains the information that you need. In this program, two DO groups are used within an IF-THEN/ELSE statement. For each value of Country, SAS determines which IF condition is true, and then executes the statements in the associated DO group. In this way, SAS assigns countryspecific values to the variables Bonus and Freq. **data work.bonus; set orion.sales; length Freq $ 12; if Country='US' then do; Bonus=500; Freq='Once a Year'; end; else if Country='AU' then do; Bonus=300; Freq='Twice a Year'; end; run;**
321
**Lesson 2: Summarizing Data** Data Manipulation Techniques Creating an Accumulating Variable Using the RETAIN Statement
322
**Lesson 2: Summarizing Data** Data Manipulation Techniques Creating an Accumulating Variable
An accumulating variable accumulates the value of another variable and keeps its value from one observation to the next. Because the variable is not reinitialized, it retains its value across multiple iterations of the DATA step.
323
**Lesson 2: Summarizing Data** Data Manipulation Techniques Creating an Accumulating Variable Using the RETAIN Statement
A compile-time-only statement that prevents SAS from reinitializing the variable at the top of the DATA step. Because the variable is not reinitialized, it retains its value across multiple iterations of the DATA step. The RETAIN statement starts with the keyword RETAIN followed by the name of the variable whose values you want to retain. You can optionally specify an initial value for the variable. If you don't specify an initial value, the RETAIN statement initializes the variable to missing before the first execution of the DATA step. **RETAIN variable-name …;**
324
**Lesson 2: Summarizing Data** Data Manipulation Techniques Creating an Accumulating Variable Using the Sum Statement
As an alternative to using the RETAIN statement with an assignment statement, you can use the sum statement. By default, the sum statement initializes the variable to 0 and retains the variable. It also ignores missing input values from the expression. **variable+expression;**
325
**Lesson 2: Summarizing Data** Data Manipulation Techniques Using BY-Group Processing: Summarizing Data by Groups
When you need to accumulate totals for a group of data, (for example, if you need to see total salaries allocated to special projects by department), the input data set needs to be sorted on the BY-variable. You can then use a BY statement in the DATA step to process the data in groups. **DATA output-SAS-data-set; SET input-SAS-data-set; BY BY-variable …; RUN;** The BY statement creates two temporary variables (FIRST.variable and LAST.variable) for each BY variable listed. These variables identify the first and last observation in each BY group. You can use the FIRST. and LAST. variables in a DATA step to summarize the grouped data. First, set the accumulating variable equal to 0 at the start of each BY group. Second, increment the accumulating variable with a sum statement. Third, output only the last observation of each BY group. **FIRST. BY-variable** **LAST. BY-variable**
326
**Lesson 2: Summarizing Data** Data Manipulation Techniques Using BY-Group Processing: Summarizing Data by Multiple Groups
When you need to accumulate totals for multiple groups, you specify two or more BY variables. The first variable listed becomes the primary sort variable, and the second variable listed is the secondary sort variable. The BY statement creates two temporary variables for each BY variable listed. If the last observation for a value of the primary sort variable is encountered, it sets LAST. to 1 for all subsequent BY variables.
327
**Lesson 3: Reading Raw Data** Data Manipulation Techniques Using Column Input
You can use column input to read input data that is arranged in columns or fixed fields. To use column input your data must be standard data in fixed columns. Standard data is data that SAS can read without special instructions. Nonstandard data is data that SAS needs special instructions to read. **INPUT variable \<$\> startcol-endcol . . . ;**
328
**Lesson 3: Reading Raw Data** Data Manipulation Techniques Using Formatted Input
You can use formatted input to read both standard and nonstandard data that is arranged in fixed fields. An informat is the special instruction that specifies how SAS reads raw data. The @n is an absolute pointer control that specifies the beginning location for a field. The +n is a relative pointer control that specifies how many columns to move the pointer before reading the next field. **INPUT column-pointer-control variable informat . . . ;** **INPUT @n variable informat . . . ;** **INPUT +n variable informat . . . ;**
329
**Lesson 3: Reading Raw Data** Data Manipulation Techniques Creating a Single Observation from Multiple Records
You can use multiple INPUT statements to read a group of records in a raw data file as a single observation in a new data set. As an alternative to writing multiple INPUT statements, you can write one INPUT statement that contains line pointer controls to specify the record(s) from which values are to be read. There are two line pointer controls, the forward slash (a relative line pointer control) and the #n (an absolute line pointer control). **DATA SAS-data-set; INFILE 'raw-data-file-name'; INPUT specifications; INPUT specifications; ** **DATA SAS-data-set; INFILE 'raw-data-file-name'; INPUT specifications / #n specifications; **
330
**Lesson 3: Reading Raw Data** Data Manipulation Techniques Controlling When a Record Loads
By default, each INPUT statement in a DATA step reads a new data record into the input buffer. You can use a line-hold specifier, the single trailing @, to prevent the second INPUT statement in a DATA step from moving to the second line in a raw data file. When you use the trailing @, the pointer position doesn't change and a new record isn't loaded in the input buffer when a subsequent input statement executes. The single trailing @ holds a raw data record in the input buffer until an INPUT statement without a trailing @ executes or the next iteration of the DATA step begins.
331
**Lesson 3: Reading Raw Data** Data Manipulation Techniques Reading Raw Data with Missing Values MISSOVER
By default, the DATA step looks in the next record if the end of the current record is encountered before all of the variables are assigned values. This default action is known as the FLOWOVER option. You can override the FLOWOVER option by using the MISSOVER option, which causes the DATA step to assign missing values to any variables that do not have values in the PDV when the end of a record is reached. The MISSOVER option works only for missing values that occur at the end of the record. You can use the DSD option in the INFILE statement to change how SAS treats delimiters when list input is used. The DSD option sets the default delimiter to a comma, treats consecutive delimiters as missing values, and enables SAS to read values with embedded delimiters if the value is surrounded by quotation marks.
332
**Lesson 3: Reading Raw Data** Data Manipulation Techniques Creating Multiple Observations from a Single Record
You can use the the double trailing @@, which is a line pointer control, to hold a record across iterations of the DATA step. The double trailing @@ should only be used with list input and should not be used with the MISSOVER option. A record that is being held by the double trailing @@ is not released until the input pointer moves past the end of the record or an INPUT statement that has no line-hold specifier executes.
333
**Lesson 3: Reading Raw Data** Data Manipulation Techniques Sample Code Using Formatted Input
data work.discounts; infile 'my-file-path\offers.dat'; input @1 Cust\_type 4. @5 Offer\_dt mmddyy8. @14 Item\_gp $8. @22 Discount percent3.; run; proc print data=work.discounts; run;
334
**Lesson 3: Reading Raw Data** Data Manipulation Techniques Sample Code Creating a Single Observation from Multiple Records
data contacts; infile 'my-file-path\address.dat'; input FullName $30.; input; input Address2 $25.; input Phone $8.; run; proc print data=work.contacts; run;
335
**Sesson 3: Reading Raw Data** Data Manipulation Techniques Sample Code Using a Single Trailing @
data salesQ1; infile 'my-file-path\sales.dat'; input SaleID $4. @6 Location $3. @; if Location='USA' then input @10 SaleDate mmddyy10. @20 Amount 7.; else if Location='EUR' then input @10 SaleDate date9. @20 Amount commax7.; run; proc print data=salesQ1; run;
336
**Sesson 3: Reading Raw Data** Data Manipulation Techniques Sample Code Using the MISSOVER Option
data work.contacts; length Name $ 20. Phone Mobile $ 14.; infile 'my-file-path\phone.csv' dlm=',' missover; input Name $ Phone $ Mobile $; run; proc print data=work.contacts; run;
337
**Sesson 3: Reading Raw Data** Data Manipulation Techniques Sample Code Using the DSD Option
data contacts2; length Name $ 20. Phone Mobile $ 14.; infile 'my-file-path\phone2.csv' dsd; input Name $ Phone $ Mobile $; run; proc print data=contacts2; run;
338
**Sesson 3: Reading Raw Data** Data Manipulation Techniques Sample Code Using the Double Trailing @@
data donate07; length ID $ 4; infile 'my-file-path\charity.dat'; input ID $ Amount @@; run; proc print data=work.donate07; run;
339
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Using SAS Functions
A SAS function is a routine that performs a calculation on, or a transformation of, the arguments listed in parentheses and returns a value. A target variable is a variable to which the result of the function is assigned. If the target variable is a new variable, the type/length are determined by the expression on the right side of the equals sign. If the expression uses a function whose result is numeric, then the target variable is numeric with a length of 8 bytes. If the expression uses a function whose result is character, then the target variable is character and the length is determined by the function. **function-name(argument-1\<,argument-n\>)**
340
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values SUBSTR
On the right side of an assignment statement, the SUBSTR function extracts a substring of characters from a character string, starting at a specified position in the string. **Var=SUBSTR(string,start\<,length\>);**
341
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values LENGTH
The LENGTH function returns the length of a character string, excluding trailing blanks. **LENGTH(argument) **
342
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values RIGHT
The RIGHT function right-aligns a value. If there are trailing blanks, they are moved to the beginning of the value. **RIGHT(argument) **
343
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values LEFT
The LEFT function left-aligns a character value. If there are leading blanks, they are moved to the end of the value. **LEFT(argument) **
344
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values CHAR
The CHAR function returns a single character from a specified position in a character string. **CHAR(string,position) **
345
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values PROPCASE
The PROPCASE function converts all letters in a value to proper case. **PROPCASE(argument\<,delimiter(s)\>) **
346
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values UPCASE
The UPCASE function converts all letters in a value to uppercase. **UPCASE(argument) **
347
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values LOWCASE
The LOWCASE function converts all letters in a value to lowercase. **LOWCASE(argument) **
348
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values SCAN
The SCAN function enables you to separate a character value into words and to return the nth word. **SCAN (string,n)**
349
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values CATX
The CATX function removes leading and trailing blanks, inserts separators, and returns a concatenated character string. **CATX (separator,string1,…,string-n) **
350
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values !!
The concatenation operator joins character strings. NewVar=string1 !! string2;
351
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values TRIM
The TRIM function removes trailing blanks from a character string. **TRIM(argument) **
352
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values STRIP
The STRIP function removes leading and trailing blanks from a character string. **STRIP(argument) **
353
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values CAT
Does not remove any leading or trailing blanks **CAT(string1,…,string-n)**
354
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values CATT
Trims trailing blanks. **CATT(string1,…,string-n)**
355
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values CATS
Strips leading and trailing blanks. **CATS(string1,…,string-n)**
356
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values FIND
The FIND function searches for a specific substring of characters within a character string that you specify. The function searches for the first occurrence of the substring and returns the starting position of that substring. If the substring is not found in the string, FIND returns a value of 0. **FIND (string,substring)**
357
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values SUBSTR
On the left side of the assignment statement, the SUBSTR function replaces characters at a specified position within the value. **SUBSTR(string,start\<,length\>)=value; **
358
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values TRANWRD
Replaces or removes all occurrences of a given word (or a pattern of characters) within a character string. **TRANWRD (source,target,replacement) **
359
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Extracting and Transforming Character Values COMPRESS
Removes the characters listed in the chars argument from the source. If no characters are specified, the COMPRESS function removes all blanks from the source. **COMPRESS (source\<,chars\>) **
360
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Sample Code Extracting and Transforming Character Values
data charities(drop=Len); length ID $ 5; set orion.biz\_list; Len=length(Acct\_Code); if substr(Acct\_Code,Len,1)='2'; ID=substr(Acct\_Code,1,Len-1); run; data charities(drop=Code\_Rt); length ID $ 5; set orion.biz\_list; Code\_Rt=right(Acct\_Code); if char(Code\_Rt,6)='2'; ID=left(substr(Code\_Rt,1,5)); run;
361
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Sample Code Separating and Concatenating Character Values
data labels; set orion.contacts; length FMName LName $ 15; FMName = scan(Name,2,','); LName = scan(Name,1,','); FullName=catx(' ',title,fmname,lname); run;
362
**Lesson 4: Manipulating Character Values** Data Manipulation Techniques Sample Code Finding and Modifying Character Values
data correct; set orion.clean\_up; if find(Product,'Mittens','I')\>0 then do; substr(Product\_ID,9,1) = '5'; Product=tranwrd(Product,'Luci ','Lucky '); end; Product=propcase(Product); Product\_ID=compress(Product\_ID); run;
363
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Using Descriptive Statistics Functions SUM
the sum of the nonmissing arguments
364
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Using Descriptive Statistics Functions MEAN
the arithmetic mean (average) of the arguments
365
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Using Descriptive Statistics Functions MIN
the smallest value from the arguments
366
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Using Descriptive Statistics Functions MAX
the largest value from the arguments
367
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Using Descriptive Statistics Functions N
the number of nonmissing arguments
368
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Using Descriptive Statistics Functions NMISS
the number of missing numeric arguments
369
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Using Descriptive Statistics Functions CMISS
the number of missing numeric or character arguments
370
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Using Descriptive Statistics Functions
A SAS function is a routine that performs a calculation on, or a transformation of, the arguments listed in parentheses and returns a value. You can list all the variables in the function, or you can use a variable list by preceding the first variable name in the list with the keyword OF. There are several types of variable lists including numbered ranges, name ranges, name prefixes, and special SAS names. **function-name(argument-1, argument-2,…,argument-n)**
371
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Using Descriptive Statistics Functions Numbered Range
all variables x1 to xn, inclusive **Total = sum(of Qtr1-Qtr4);**
372
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Using Descriptive Statistics Functions Name Range
all variables ordered as they are in the program data vector, from x to a inclusive **Total = sum(of Qtr1--Fourth);**
373
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Using Descriptive Statistics Functions Name Prefix
all variables that begin with the same string **Total = sum(of Tot:);**
374
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Using Descriptive Statistics Functions Special SAS Name List
all of the variables, all of the character variables, or all numeric variables that are already defined in the current DATA step **Total = sum(of \_All\_); Total = sum(of \_Character\_); Total = sum(of \_Numeric\_);**
375
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Truncating Numeric Values ROUND
Returns a value rounded to the nearest multiple of the round-off unit. If you don't specify a round-off unit, the argument is rounded to the nearest integer. **ROUND(argument\<,round-off-unit\>)**
376
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Truncating Numeric Values CEIL
The CEIL function returns the smallest integer greater than or equal to the argument. **CEIL(argument)**
377
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Truncating Numeric Values FLOOR
The FLOOR function returns the greatest integer less than or equal to the argument. **FLOOR(argument)**
378
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Truncating Numeric Values INT
The INT function returns the integer portion of the argument. **INT(argument)**
379
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Converting Values Between Data Types
You can allow SAS to automatically convert data to a different data type for you, but it can be more efficient to use SAS functions to explicitly convert data to a different data type. By default, if you reference a character variable in a numeric context, SAS tries to convert the variable values to numeric. Automatic conversion uses the w. informat, and it produces a numeric missing value from any character value that does not conform to standard numeric notation.
380
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Converting Values Between Data Types INPUT
You can use the INPUT function to explicitly convert character values to numeric values. The INPUT function returns the value that is produced when the source is read with a specified informat. Numeric data values are automatically converted to character values whenever they are used in a character context. For example, SAS automatically converts a numeric value to a character value when you use the concatenation operator. When SAS automatically converts a numeric value to a character value, SAS writes the numeric value with the BEST12. format and right aligns the value. The resulting value might contain leading blanks. **INPUT(source, informat)**
381
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Converting Values Between Data Types PUT
You can use the PUT function to explicitly control the numeric-to-character conversion using a format. **PUT(source, format)**
382
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Sample Code Using Descriptive Statistics Functions and Truncating Numeric Values
data donation\_stats; set orion.employee\_donations; keep Employee\_ID Total AvgQT NumQT; Total = sum(of Qtr1-Qtr4); AvgQT = round(Mean(of Qtr1-Qtr4),1); NumQt = n(of Qtr1-Qtr4); run; proc print data=donation\_stats; run;
383
**Lesson 5: Manipulating Numeric Values** Data Manipulation Techniques Sample Code Converting Values Between Data Types
data hrdata; keep EmpID GrossPay Bonus Phone HireDate; set orion.convert; EmpID = ID+11000; Bonus = input(GrossPay,comma6.)\*.10; Phone = '(' !! put(Code,3.) !! ') ' !! Mobile; HireDate = input(Hired,mmddyy10.); run; proc print data=hrdata; format HireDate mmddyy10.; run;
384
**Lesson 6: Debugging Techniques** Data Manipulation Techniques Identifying Logic Errors
Syntax errors occur when programming statements don’t conform to the rules of the SAS language. When a syntax error occurs, SAS writes an error message to the log. Logic errors occur when the programming statements follow the rules but the results aren’t correct. Since the statements conform to the rules, SAS doesn't write an error message to the log. The lack of messages can make logic errors more difficult to detect and correct than syntax errors.
385
**Lesson 6: Debugging Techniques** Data Manipulation Techniques Using PUTLOG Statements
Display messages, variable names, and variable values in the log. This technique is helpful when you suspect that the value of a variable might be causing a logic error. By default, the PUTLOG statement writes character values with the standard character format $w. To use a different format, specify the name of the variable followed by the format name and width. When you're debugging a program, it's often helpful to see what SAS has stored in the program data vector. To write the current contents of the PDV to the log, use the \_ALL\_ option in the PUTLOG statement. When you use the \_ALL\_ option, the values of the automatic variables \_ERROR\_ and \_N\_ are included in the log. **PUTLOG ; PUTLOG 'text'; PUTLOG variable-name=; PUTLOG variable-name=format-namew.; PUTLOG \_ALL\_;**
386
**Lesson 6: Debugging Techniques** Data Manipulation Techniques Using PUTLOG Statements END and SET
Use the END= option in the SET statement to create and name a temporary variable that acts as an end-of-file indicator. You can also use the END= option in an INFILE statement to indicate the end of a raw data file. The value of the END= variable is initialized to 0 and is set to 1 when the SET statement reads the last observation from an input data set or when the INPUT statement reads the last observation from a raw data file. You can check the value of the END= variable in an IF statement to conditionally execute PUTLOG statements. **SET SAS-data-set END= variable ; INFILE 'raw-data-file' END= variable ;**
387
**Lesson 6: Debugging Techniques** Data Manipulation Techniques Using PUTLOG Statements Using the DATA Step Debugger
This tool consists of windows and a group of commands. By issuing commands, you can execute DATA step statements one by one and pause to display the resulting variable values in a window. By observing the results that are displayed, you can determine where the logic error occurs. The DATA Step Debugger can only be used in an interactive session. To invoke the debugger, you use the DEBUG option in the DATA statement. Because the debugger is interactive, you can repeat the process of issuing commands and observing results as many times as you need during a single debugging session. Once the DATA step comes to an end, it can’t be restarted. You can examine the final values of the variables. But, to restart the debugging process, you have to quit and then restart the DATA Step Debugger. **DATA data-set-name/DEBUG;**
388
**Lesson 6: Debugging Techniques** Data Manipulation Techniques Sample Code Using PUTLOG Statements
data us\_mailing; set orion.mailing\_list (obs=10); drop Address3; length City $ 25 State $ 2 Zip $ 5; putlog \_n\_=; putlog "Looking for country"; if find(Address3,'US') ; putlog "Found US"; Name=catx(' ',scan(Name,2,','),scan(Name,1,',')); City=scan(Address3,1,','); State=scan(address3,2,','); Zip=scan(Address3,3,','); putlog State=$quote4. Zip=$quote7.; run;
389
**Lesson 6: Debugging Techniques** Data Manipulation Techniques Sample Code Using the DATA Step Debugger
data us\_mailing /debug; set orion.mailing\_list; drop Address3; length City $ 25 State $ 2 Zip $ 5; if find(Address3,'US'); Name=catx(' ',scan(Name,2,','),scan(Name,1,',')); City=scan(Address3,1,','); State=scan(Address3,2,','); Zip=scan(Address3,3,','); run;
390
**Lesson 6: Debugging Techniques** Data Manipulation Techniques Valid Reason for using PUTLOG
391
**Lesson 7: Using Iterative DO Loops** Data Manipulation Techniques Constructing a Simple DO Loop
An iterative DO loop executes the statements between the DO statement and the END statement repetitively. If you do not specify an increment for a DO loop, the increment defaults to 1. If your start value is greater than your stop value, you must specify an increment that is negative. You can use an item list rather than a start value and stop value to control your DO loop. The items must either be all numeric or all character constants, or they can be variables. Can use the OUTPUT statement within to explicitly write the values out to the data set on each iteration of the DO loop. **DO index-variable=start TO stop ;** **iterated SAS statements…** **END;** **DO index-variable=item-1 ;** **iterated SAS statements…** **END;**
392
**Lesson 7: Using Iterative DO Loops** Data Manipulation Techniques Understanding DO Loop Logic
393
**Lesson 7: Using Iterative DO Loops** Data Manipulation Techniques DO Loop Values
394
**Lesson 7: Using Iterative DO Loops** Data Manipulation Techniques Understanding DO Loop Logic
395
**Lesson 7: Using Iterative DO Loops** Data Manipulation Techniques Understanding DO Loop Logic DO Statment with Item List
396
**Lesson 7: Using Iterative DO Loops** Data Manipulation Techniques Understanding DO Loop Logic How Many Times Loop Executes
397
**Lesson 7: Using Iterative DO Loops** Data Manipulation Techniques Conditionally Executing DO Loops DO UNTIL
You can use a DO UNTIL statement instead of a simple DO statement. In a DO UNTIL statement, you specify a condition and SAS executes the loop until that condition is true. In a DO UNTIL loop, SAS evaluates the expression at the bottom of the loop after each iteration. **DO UNTIL (expression);** ** iterated SAS statements…** **END;**
398
**Lesson 7: Using Iterative DO Loops** Data Manipulation Techniques Conditionally Executing DO Loops DO WHILE
You can use a DO WHILE statement instead of a simple DO statement. In a DO WHILE statement, you specify a condition and SAS executes the loop while the condition is true. In a DO WHILE loop, SAS evaluates the condition at the top of the loop and executes the statements within the loop if the condition is true. **DO WHILE (expression);** **iterated SAS statements…** **END;**
399
**Lesson 7: Using Iterative DO Loops** Data Manipulation Techniques Conditionally Executing DO Loops DO WHILE | UNTIL
It is possible to create a DO WHILE loop that never executes; it is also possible to create a DO WHILE or a DO UNTIL loop that executes infinitely. You should write your conditions and iterated statements carefully. Combine DO UNTIL and DO WHILE statements with iterative DO statement, which is a way to avoid creating an infinite loop. In this case, the loop executes either until the value of the index variable exceeds the specified range or until the expression is true for an UNTIL clause or false for a WHILE clause.
400
**Lesson 7: Using Iterative DO Loops** Data Manipulation Techniques Nesting DO Loops
You can nest DO loops in a DATA step. You must use different index variables for each loop, and you must be certain that each DO statement has a corresponding END statement. DO index-variable=start TO stop ; iterated SAS statements… DO index-variable=start TO stop ; iterated SAS statements… END; iterated SAS statements… END;
401
**Lesson 7: Using Iterative DO Loops** Data Manipulation Techniques DO Loop Execution
402
**Lesson 7: Using Iterative DO Loops** Data Manipulation Techniques Sample Code Using an Iterative DO Loop
data compound; Amount=50000; Rate=.045; do i=1 to 20; Yearly+(Yearly+Amount)\*Rate; end; do i=1 to 80; Quarterly+((Quarterly+Amount)\*Rate/4); end; run; proc print data=work.compound; run;
403
**Lesson 7: Using Iterative DO Loops** Data Manipulation Techniques Sample Code Using a DO Loop to Reduce Redundant Code
data forecast; set orion.growth; do Year=1 to 6; Total\_Employees= Total\_Employees\*(1+Increase); output; end; run; proc print data=forecast noobs; run;
404
**Lesson 7: Using Iterative DO Loops** Data Manipulation Techniques Sample Code Conditionally Executing DO Loops
data invest; do until (Capital\>1000000); Year+1; Capital +5000; Capital+(Capital\*.045); end; run; proc print data=invest noobs; run;
405
**Lesson 7: Using Iterative DO Loops** Data Manipulation Techniques Sample Code Using an Iterative DO Loop with a Conditional Clause
data invest; do year=1 to 30 until (Capital\>250000); Capital +5000; Capital+(Capital\*.045); end; run; data invest2; do year=1 to 30 while (Capital\<=250000); Capital +5000; Capital+(Capital\*.045); end; run; proc print data=invest; format Capital dollar14.2; run; proc print data=invest2; format Capital dollar14.2; run;
406
**Lesson 7: Using Iterative DO Loops** Data Manipulation Techniques Sample Code Nesting DO Loops
data invest (drop=Quarter); do Year=1 to 5; Capital+5000; do Quarter=1 to 4; Capital+(Capital\*(.045/4)); end; output; end; run; proc print data=invest; format Capital dollar14.2; run;
407
**Lesson 8: Using SAS Arrays** Data Manipulation Techniques Understanding SAS Arrays
A SAS array is a temporary grouping of elements that exists only for the duration of the DATA step. Unlike arrays in other programming languages, SAS arrays are not data structures. An array is identified by a single unique name. The array name must be different from any variable name in the data set you are referencing. All variables that are grouped together in an array must be the same type: either all character or all numeric. Arrays can be one-dimensional or multi-dimensional.
408
**Lesson 8: Using SAS Arrays** Data Manipulation Techniques Creating SAS Arrays
**ARRAYarray-name {dimension};** \_array-name specifies the name of the array. \_dimension describes the number and arrangement of elements in the array. The default dimension is one. The array dimension must be enclosed in braces, parentheses, or brackets. It is best to use braces or brackets so that there is no confusion with functions. You can use an asterisk to indicate the dimension of a one-dimensional array. When you use an asterisk, SAS determines the dimension of the array by counting the variables in the list of array elements. You can list each variable name separated by a space, or you can use a numbered range list or name range list specification. To create an array of character variables, type a dollar sign ($) after the dimension in the ARRAY statement. By default, all character variables that are created in an ARRAY statement are assigned a length of 8. You can assign a different length by specifying the length after the dollar sign. \_array-elements lists the variables to include in the array. Array elements can be listed in any order, but they must be either all numeric or all character. You can list each variable name separated by a space, or you can use a variable list. You can also specify the keywords \_NUMERIC\_ or \_CHARACTER\_ .
409
**Lesson 8: Using SAS Arrays** Data Manipulation Techniques Processing SAS Arrays
The syntax for an array reference is the name of the array, followed by a subscript enclosed in braces, brackets, parentheses. The subscript can be an integer, a variable, or a SAS expression. Typically, arrays are used with DO loops to process multiple variables and to perform repetitive calculations. When you use a DO loop, index variable is used as the array subscript and the DO loop references each element of the array. **ARRAYarray-name {dimension};**
410
**Lesson 8: Using SAS Arrays** Data Manipulation Techniques Processing SAS Arrays DIM
**DIM (array-name)** Another way to specify the stop value of a DO loop is to use the DIM function. The DIM function returns the number of elements in the array.
411
**Lesson 8: Using SAS Arrays** Data Manipulation Techniques Processing SAS Arrays Using SAS Arrays to Create Variables and Perform Calculations
When you do not reference existing variables in the ARRAY statement, SAS automatically creates the variables and assigns default names to them. Default variable names are created by concatenating the array name and numbers 1, 2, 3, +, up to the array dimension. When you create variables in an ARRAY statement, the default variable names will match the case used for the array name. Alternatively, you can specify the new variable names by listing them in the ARRAY statement. Variables that you create in an ARRAY statement all have the same variable type. If you want to create an array of character variables, you must add a dollar sign after the array dimension. By default, all character variables that are created in an ARRAY statement are assigned a length of 8. Assign a different length for variables by specifying length for each variable in LENGTH statement prior to ARRAY statement. You can pass an array to a function using the keyword OF. This is similar to passing a variable list to a function. **ARRAYarray-name {dimension}\<$\> ;**
412
**Lesson 8: Using SAS Arrays** Data Manipulation Techniques Processing SAS Arrays Assigning Initial Values to an Array
To assign initial values in an ARRAY statement, you place the values in an initial value list. In this list, you specify one initial value for each corresponding array element. Elements and values are matched by position, so the values must be listed in the order of the array elements. You separate each value with a comma or blank, and you enclose the values in parentheses. If you are assigning character values, each value must be enclosed in quotation marks. When you specify an initial value list, all elements behave as if they were named in a RETAIN statement. This creates a lookup table, that is, a list of values to refer to during DATA step processing. Use the keyword \_TEMPORARY\_ in an ARRAY statement to indicate that the elements are not needed in the output data set. Temporary arrays are useful when you only need the array to perform a calculation. You can improve performance time by using temporary array elements. **ARRAY array-name{dimension} \<\_TEMPORARY\_\> \<(initial-value-list)\>;**
413
**Lesson 8: Using SAS Arrays** Data Manipulation Techniques Sample Code Processing SAS Arrays
data charity; set orion.employee\_donations; keep eomployee\_id qtr1-qtr4; array contrib {\*} qtr1-qtr4; do i=1 to dim(contrib); contrib {i}=contrib{i}\*1.25 end; run;
414
**Lesson 8: Using SAS Arrays** Data Manipulation Techniques Sample Code Using SAS Arrays to Create Variables and Perform Calculations
data percent (drop=i) set orion.employee\_donations; array contrib {4} qtr1-qtr4; array Pct {4}; Total=sum(of contrib {\*}); do i=1 to 4; pct{i}=contrib{i}/Total; end; run;
415
**Lesson 8: Using SAS Arrays** Data Manipulation Techniques Sample Code Assigning Initial Values to an Array
data compare (drop=i); set orion.employee\_donations; array Contrib {4} Qtr1-Qtr4; array Diff{4}; array Goal{4} \_temporary\_ (10,20,20,15); do i=1 to 4 Diff {i}=sum(Contrib{i},-Goal{i}); end; run;
416
**Lesson 9: Restructuring a Data Set** Data Manipulation Techniques An Overview of Data Set Structure Wide Data Set
Data sets that store all the information about one entity in a single observation.
417
**Lesson 9: Restructuring a Data Set** Data Manipulation Techniques An Overview of Data Set Structure Narrow Data Set
Multiple observations per entity. Each observation typically contains a small amount of data and missing values might or might not be stored.
418
**Lesson 9: Restructuring a Data Set** Data Manipulation Techniques An Overview of Data Set Structure
When you write a program, you need to consider the data available, the output you want, and the processing required. To create your output, you might need to restructure the data. Depending on the type of analysis or report you want to run, you might need to convert a narrow data set to a wide data set, or a wide data set to a narrow data set.
419
**Lesson 9: Restructuring a Data Set** Data Manipulation Techniques Rotating with the DATA Step
Restructuring a data set is sometimes referred to as rotating a data set. You can use a DATA step to restructure a data set. Using a DATA step enables you to create multiple data sets, direct output to new data sets based on which data set contributed to the observation, use FIRST. and LAST. processing, and use complex data manipulation techniques.
420
**Lesson 9: Restructuring a Data Set** Data Manipulation Techniques Transposing a Data Set
Alternative to writing a lengthy DATA step, use the TRANSPOSE procedure to restructure the values in a SAS data set. The TRANSPOSE procedure transposes selected variables into observations . You can use a number of options and statements with PROC TRANSPOSE. PROC TRANSPOSE transposes numeric variables by default. Character variables are transposed only if they’re explicitly listed in a VAR statement. PROC TRANSPOSE doesn’t print the output data set. So, you need to use PROC PRINT, PROC REPORT, or some other SAS reporting tool if you want to print the output data. You can use a BY statement with PROC TRANSPOSE. For each BY group, PROC TRANSPOSE creates one observation for each variable that it transposes. The BY variable is not transposed. When you use a BY statement with PROC TRANPOSE, unless you specify the NOTSORTED option, the original data set must be sorted or indexed by all BY variables. Specifies that the observations are not necessarily sorted in alphabetic or numeric order. For example, the data might be grouped in some other way, such as chronological order. In the output data set, \_NAME\_ is the default name of the variable that PROC TRANSPOSE creates to identify the source of the values in each observation in the output data set. The remaining transposed variables are named COL1 through COLn. The default label for the \_NAME\_ variable is NAME OF FORMER VARIABLE. You can use the LABEL option to see this label in your PROC PRINT output. You can use the NAME= option to give the \_NAME\_ variable a more descriptive name. **PROC TRANSPOSE DATA=input-data-set ; BY variable(s) ; VAR variable(s); ID variable; RUN;**
421
**Lesson 9: Restructuring a Data Set** Data Manipulation Techniques Transposing a Data Set NAME=
The NAME= option names the column in the output data set containing the rotated variable names.
422
**Lesson 9: Restructuring a Data Set** Data Manipulation Techniques Transposing a Data Set ID Statement
Names the variable whose values become the names of the new variables.
423
**Lesson 9: Restructuring a Data Set** Data Manipulation Techniques Transposing a Data Set OUT=
Specify a new name for the output data set.
424
**Lesson 9: Restructuring a Data Set** Data Manipulation Techniques Modifying and Enhancing a Transformation
You can use an ID statement to specify the variable whose values will become the names of the new columns. When you use a numeric variable as an ID variable, PROC TRANSPOSE changes the formatted ID value to valid SAS name. You can use the PREFIX= option to specify a prefix for each new variable name.
425
**Lesson 9: Restructuring a Data Set** Data Manipulation Techniques Sample Code Rotating with the DATA Step
data rotate (keep=Employee\_ID Period Amount); set orion.employee\_donations (drop=Recipients Paid\_By); length Period $4; array contrib{4} Qtr1-Qtr4; do i=1 to 4; if contrib{i} ne . then do; Period=cats("Qtr",i); Amount=contrib{i}; output; end; end; run;
426
**Lesson 9: Restructuring a Data Set** Data Manipulation Techniques Sample Code Using the TRANSPOSE Procedure
proc transpose data=targets out=sales\_targets name=Month; by year; run; proc print data=sales\_targets noobs label; run;
427
**Lesson 10: Combining SAS Data Sets** Data Manipulation Techniques The MERGE Statement
You Combine observations from two or more SAS data sets into a single observation in a new data set according to the values of one or more common variables. To perform a matchmerge, you specify at least two data sets in the MERGE statement. If you specify only one data set, SAS treats the MERGE statement like a SET statement. you use a BY statement immediately after the MERGE statement. The BY statement performs the matchmerge by matching observations on one or more BY variables. You must first make sure to sort your input data sets on the BY variable(s). The BY variable(s) must be common to all input data sets. **DATA SASdataset; MERGE SASdataset1 SASdataset2...; BY BYvariable(s); RUN;**
428
**Lesson 10: Combining SAS Data Sets** Data Manipulation Techniques The IN= Data Set Option
You specify an IN= data set option after an input data set in the MERGE statement. SAS creates a temporary variable that indicates whether the data set contributed data to each output observation. The syntax of the IN= data set option is **SASdataset (IN=variable)** The IN= data set option has the following elements: * After a SAS data set name, you specify the IN= data set option in parentheses. * Following the equal sign (=) in the IN= option, you specify a valid SAS variable name. * Within the DATA step, the value of the variable is 1 if the data set contributes to the current observation. * Otherwise, the value is 0. SAS processes the IN= data set option in the DATA step as follows: * During the execution phase, SAS creates a temporary variable in the PDV for each instance of the IN= data set option in your code. * Each time SAS reads data into the PDV, SAS assigns a value to the temporary variables to * indicate whether the associated data set contributes data to the current observation. * If a data set contributes data, SAS assigns a value of 1 to the associated variable. * If a data set doesn’t contribute data, SAS assigns a value of 0. * 1 and 0 are the only possible values for the temporary variables that you create using the IN=option.
429
**Lesson 10: Combining SAS Data Sets** Data Manipulation Techniques Match-Merging SAS Data Sets
You can use the MERGE statement in a DATA step to combine SAS data sets with related data into a single observation in a new data set based on the values of one or more common variables. Before the merge, each input data set must first be sorted in order of the values of the BY variable(s). A match-merge produces matches (observations containing data from both input data sets) and non-matches (observations containing data from only one input data set) by default. To identify the data sets that contribute observations, you can use the IN= data set option. You specify an IN= data set option after an input data set in the MERGE statement. SAS creates a temporary variable that indicates whether the data set contributed data to each output observation. You can use the subsetting IF statement to output only those observations that contain data from all the input data sets or from just one data set. DATA SAS-data-set; MERGE SAS-data-set1 SAS-data-set2 . . . ; BY BY-variable(s); RUN; SAS-data-set (IN=variable)
430
**Lesson 10: Combining SAS Data Sets** Data Manipulation Techniques Using Data Manipulation Techniques with a Match-Merge
You can use an OUTPUT statement with the MERGE statement to better control your output. For example, you can direct the matches (observations containing data from both input data sets) to one data set, and the non-matches (observations containing data from only one input data set) to another data set. To control which variables appear in your output, you can use the KEEP= data set option, the DROP= data set option, the KEEP statement, or the DROP statement. You can use FIRST. and LAST. processing along with a sum statement to summarize merged data. **OUTPUT ;**
431
**Lesson 10: Combining SAS Data Sets** Data Manipulation Techniques Match-Merging Data Sets That Lack a Common Variable
If data sets don’t share a common variable, you can merge them using a series of merges in separate DATA steps. The data sets must be sorted by the appropriate BY variable.
432
**Lesson 10: Combining SAS Data Sets** Data Manipulation Techniques Match-Merging a SAS Data Set and an Excel Workbook
You can use the SAS/ACCESS LIBNAME statement to assign a libref to an Excel workbook. This way, SAS treats each worksheet within the workbook as though it is a SAS data set. When you assign a libref to an Excel workbook, SAS appends a dollar sign to the end of each Excel worksheet name. But a valid SAS data set name can’t contain a dollar sign. You use a SAS name literal to refer to an Excel worksheet in SAS code. You enclose the name of the worksheet, including the dollar sign, in quotation marks followed by the letter n. Remember to clear the libref to unlock the Excel workbook file. **LIBNAME libref 'physical-file-name';**
433
**Lesson 10: Combining SAS Data Sets** Data Manipulation Techniques Match-Merging Data Sets with Same-Named Variables
When you match-merge data sets that contain same-named variables (other than the BYvariables), the DATA step overwrites values of the same-named variable in the first data set in which it appears with values of the same-named variable in subsequent data sets. You can use the RENAME= data set option in the MERGE statement to rename variables in a data set. The RENAME= option doesn’t rename the variable in the input data set. Instead, it tells SAS which slot in the PDV to use when building the observation. **SAS-data-set (RENAME = (old-name-1 = new-name-1 \<…old-name-n = new-name-n\>))**
434
**Lesson 10: Combining SAS Data Sets** Data Manipulation Techniques Sample Code Match-Merging SAS Data Sets
data CustOrd; merge orion.customer(in=cust) work.order\_fact(in=order); by Customer\_ID; if cust and order; run;
435
**Lesson 10: Combining SAS Data Sets** Data Manipulation Techniques Sample Code Using Data Manipulation Techniques with a Match-Merge
data orders(keep=Customer\_Name Quantity Total\_Retail\_Price) summary(keep=Customer\_Name NumberOrders) noorders(keep=Customer\_Name Birth\_Date); merge orion.customer work.order\_fact(in=order); by Customer\_ID; if order=1 then do; output orders; if first.Customer\_ID then NumberOrders=0; NumberOrders+1; if last.Customer\_ID then output summary; end; else output noorders; run;
436
**Lesson 10: Combining SAS Data Sets** Data Manipulation Techniques Sample Code Match-Merging a SAS Data Set and an Excel Workbook
libname survey 'my-file-path\survey.xls'; data CustOrdProdSurv; merge CustOrdProd(in=c) survey.'Supplier$'n(in=s); by Supplier; if c=1 and s=1; run; libname survey clear;
437
**Lesson 10: Combining SAS Data Sets** Data Manipulation Techniques Sample Code Match-Merging SAS Data Sets with Same-Named Variables
data CustOrdProdSurv; merge CustOrdProd (in=c rename=(Quantity=Qty)) survey.'Supplier$'n(in=s); by Supplier; if c=1 and s=1; if Qty\>=Quantity and Total\_Retail\_Price\>=Price; run;
438
**Lesson 11: An Introduction to the SQL Procedure** Data Manipulation Techniques Understanding PROC SQL
SQL (Structured Query Language) is a standardized language that many software products use to retrieve, join, and update data in tables. Using PROC SQL, you can write ANSI standard SQL queries that retrieve and manipulate data. PROC SQL can query data that is stored in one or more SAS data sets or any other types of data files that you can access by using SAS/ACCESS engines. In SQL terminology, a SAS data set is a table, a variable is a column, and an observation is a row. You can perform many of the same tasks by using PROC SQL and the DATA step. Each technique has advantages. **PROC SQL; SELECT column-1\<, column-2\>… FROM table-1… ; QUIT;** In a PROC SQL step, the PROC SQL statement starts the SQL procedure. The SQL procedure stops running when SAS encounters one of the following step boundaries: the QUIT statement, or the beginning of another PROC step or a DATA step. A PROC SQL step can contain one or more statements. The SELECT statement (a query) retrieves data from one or more tables and creates a report by default. A PROC SQL step can contain one or more SELECT statements. The CREATE TABLE statement is one of many other statements that can appear in a PROC SQL step.
439
**Lesson 11: An Introduction to the SQL Procedure** Data Manipulation Techniques Querying a Table
SELECT column-1\<, column-2\>… FROM table-1… ; The SELECT statement is composed of clauses and ends with a semicolon. The SELECT clause and the FROM clause are the only required clauses. Optional clauses include the WHERE clause, the GROUP BY clause, the HAVING clause, and the ORDER BY clause. In the SELECT statement, the clauses must appear in the order in which they are listed here. **SELECT column-1\<, column-2\>…** **SELECT \*** The SELECT clause specifies the columns to include in the query result. After the keyword, you can specify the names of one or more columns in the table(s) you’re querying, separated by commas. To select all columns in the input table(s), you specify an asterisk instead of column names. **FROM table-1… ** The FROM specifies the table/ tables that contain the columns. After the keyword, you specify the name(s) of the table(s WHERE subsets rows by identifying a condition that must be satisfied for each row to be included in the output. You can use any valid SAS expression to specify the condition. CREATE TABLE table-name AS SELECT column-1\<, column-2\>… FROM table-1… ; To create an output table (a SAS data set) from results of query instead of a report, use the CREATE TABLE statement. After the keywords CREATE TABLE, you specify the name of the output table, followed by the keyword AS. Then, you specify the clauses that are used in a query: SELECT, FROM, and any optional clauses. As in the SELECTstatement, the query clauses must appear in the order shown here.
440
**Lesson 11: An Introduction to the SQL Procedure** Data Manipulation Techniques Joining Tables
Joining tables enables you to select data from multiple tables as if the data were contained in one table. Joins do not alter the original tables. To join tables, you can use a SELECT statement. **SELECT column-1\<, column-2\>… FROM table-1, table-2…** ** ** ** ****;** The SELECT clause specifies the columns that appear in the report. The FROM clause specifies the tables to be joined. You use commas to separate table names. The WHERE clause specifies one or more join-conditions that PROC SQL uses to combine and select rows for the result set. The join-conditions are expressed as an sql-expression, which can be any valid SAS expression. In addition to join-conditions, the WHERE clause can also specify an expression that subsets the rows. Additional clauses might also appear in a SELECT statement that joins tables. In a basic PROC SQL join, the SELECT statement does not have a WHERE clause. PROC SQL combines each row from the first table with every row from the second table to create a Cartesian product. A basic join is resource intensive and is rarely used. To join tables by matching rows based on the values of a common column, you can include a WHERE clause in the SELECT statement. Inner join is a specific join that returns only a subset of the rows from the 1st table that matches the rows from the 2nd table. If the SELECT clause, the FROM clause, or the WHERE clause references a column that has the same name in multiple tables, you must specify the table name and a period before the column name. Prefixing the table name is called qualifying a column. **FROM table-1 alias-1, table-2 alias-2 …** Qualified table name can specify an alias instead of a full table name. You can specify aliases for tables in the FROM clause. After the table name, you can optionally specify the keyword AS. Then, specify an alias, which can be any valid SAS name.
441
**Lesson 11: An Introduction to the SQL Procedure** Data Manipulation Techniques Sample Code Querying a Table to Create a Report
proc sql; select Employee\_ID, Job\_Title, Salary from orion.sales\_mgmt where Gender='M'; quit;
442
**Lesson 11: An Introduction to the SQL Procedure** Data Manipulation Techniques Sample Code Querying a Table to Create an Output Data Set
proc sql; create table direct\_reports as select Employee\_ID, Job\_Title, Salary from orion.sales\_mgmt; quit;
443
**Lesson 11: An Introduction to the SQL Procedure** Data Manipulation Techniques Sample Code Joining Tables by Using Full Table Names to Qualify Columns
proc sql; select sales\_mgmt.Employee\_ID, Employee\_Name, Job\_Title, Salary from orion.sales\_mgmt, orion.employee\_addresses where sales\_mgmt.Employee\_ID = employee\_addresses.Employee\_ID; quit;
444
**Lesson 11: An Introduction to the SQL Procedure** Data Manipulation Techniques Sample Code Joining Tables by Using Table Aliases to Qualify Columns
proc sql; select s.Employee\_ID, Employee\_Name, Job\_Title, Salary from orion.sales\_mgmt as s, orion.employee\_addresses as a where s.Employee\_ID = a.Employee\_ID; quit;
445
**Lesson 12: An Introduction to the SAS Macro Facility** Data Manipulation Techniques What Is the Macro Facility?
Tool for extending and customizing SAS and for reducing the amount of text that you must enter to complete tasks. The macro facility consists of the macro processor and the SAS macro language. You can package small amounts of text into units called macro variables, or you can package larger amounts of text into units called macro programs. You can use macro variables to substitute text into a SAS program, which makes the program easily adaptable.
446
**Lesson 12: An Introduction to the SAS Macro Facility** Data Manipulation Techniques Basic Concepts of Macro Variables
Can supply a variety of information, including operating system information, SAS session information, or text strings. There are two types of macro variables: automatic macro variables and user-defined macro variables. To use a macro variable in a program, you reference it in your code by preceding the macro variable name with &. When you submit the program, the macro processor resolves the reference before the program compiles and executes. You can reference a macro variable anywhere within a SAS program except within datalines. To reference a macro variable within quotation marks, you must use double quotation marks. **¯o-variable-name**
447
**Lesson 12: An Introduction to the SAS Macro Facility** Data Manipulation Techniques Using Automatic Macro Variables
Automatic macro variables contain system information such as the date and time that the current SAS session began. SAS creates these automatic macro variables when the SAS session begins, and they are always available. Common automatic macro variables include SYSDATE, SYSDATE9, SYSDAY, SYSLAST, SYSSCP, SYSTIME, and SYSVER.
448
**Lesson 12: An Introduction to the SAS Macro Facility** Data Manipulation Techniques Creating and Using Your Own Macro Variables
You use the %LET statement to create a macro variable and assign a value to it. The name that you assign to a macro variable must follow SAS naming rules. The value can be any text string. You don't need to enclose the value in quotation marks. SAS stores all macro variable values as text strings, even if they contain numbers. SAS doesn't evaluate mathematical expressions in macro variable values. SAS stores everything in a %LET statement between the equal sign and the semicolon, except leading and trailing blanks, as the value of the macro variable. **%LET macro-variable=value;**
449
**Lesson 12: An Introduction to the SAS Macro Facility** Data Manipulation Techniques Displaying Macro Variables in the SAS Log
Because the macro processor resolves macro variable references after a program is submitted but before it is compiled and executed, you cannot see the value that gets substituted into the program. You can use the SYMBOLGEN system option to control whether or not SAS writes messages about the resolution of macro variable references to the SAS log. The default value of this option is NOSYMBOLGEN. You can use the %PUT statement to write messages to the SAS log. If you include a macro variable reference in a %PUT statement, SAS resolves the reference before writing the message in the SAS log, so the macro variable value appears in the log. You can also use keywords such as \_USER\_, \_AUTOMATIC\_, or \_ALL\_ to list macro variable values in the SAS log. **OPTIONS SYMBOLGEN;** **%PUT text;**
450
**Lesson 12: An Introduction to the SAS Macro Facility** Data Manipulation Techniques Sample Code Using Automatic Macro Variables
proc print data=orion.customer\_type; title 'Listing of Customer\_Type Data Set'; footnote1 'Created &SYSTIME &SYSDAY, &SYSDATE9'; footnote2 'on the &SYSSCP System Using SAS &SYSVER'; run; proc print data=work.subset1 label; run;
451
**Lesson 12: An Introduction to the SAS Macro Facility** Data Manipulation Techniques Sample Code Creating and Using User-Defined Macro Variables
``` options symbolgen; %let year=2007; ``` proc print data=orion.order\_fact; where year(order\_date)=&year; title "Orders for &year"; run; proc means data=orion.order\_fact mean; where year(order\_date)=&year; class order\_type; var total\_retail\_price; title "Average Retail Price for &year"; title2 "by Order\_Type"; run;
452
**Lesson 12: An Introduction to the SAS Macro Facility** Data Manipulation Techniques Sample Code Displaying Macro Variable Values in the SAS Log
%put \_user\_;