Programa P1-3 Flashcards
Data step internals Data Handling Data step processing
What is the use of the (1) ON THE ODS HTML statement?
The use of the (1) on the ODS HTML statement is to ensure a new HTML destination is opened and closed
without affecting the default HTML output which the SAS Display Manager uses
Describe the PUT Function.
The PUT function is used to convert numeric values to character values or character values to other character values.
targetvariable=PUT(variable,format.);
* PUT function create new character variables based on the input variable in the first argument, and the format in the second argument
Describe the INPUT Function.
The INPUT function reads the value of a variable using a specified Informat, as opposed to writing the value of a variable having applied a
Format.
targetvariable=INPUT(variable,informat.);
*The INPUT function is used to convert character values to numeric values.
In the layered structure of the SAS System where is the DATA step situated?
The Supervisor layer:
* This core layer is fully portable between systems.
* It defines the environment used by the SAS System
*It contains the most-used core routines .e.g
- DATA step processor
- Data set management
- Command parsing
In the layered structure of the SAS System where is the PROC step situated?
The Applications Layer:
* This is the ‘top’ layer, representing all the procedures and ancillary services.
* The layer is fully environment-independent
* It represents approximately 70% of the total code
What is the DATA step processor is responsible for?
The compilation and execution of a DATA step is controlled by the DATA step processor. It’s responsible for:
* Checking the syntax of DATA step code;
* Compiling DATA step source code;
Optimising the executable image;
* Executing the resulting machine code;
* Accessing data through the input / output Engine Supervisor.
In the layered structure of the SAS System where is the OS + Memory/storage situated?
The Host Layer:
* Provides the interface to the host environment and controls the following:
- Using operating system calls to perform resource allocation to each task;
- Memory management;
- Dynamic loading and unloading of programs;
- Efficient storage of SAS data sets;
- Input / Output services;
- Generation of host dependent machine code;
- Full-screen support and error handling services.
* The Host layer represents 10% of the code.
In detail, explain the compilation phase.
- Compile:
* Translation from source code to machine code.
* The first action of the DATA step is to scan the SAS statements for syntax errors. If any errors are found, messages are written to the Log and the DATA step stops.
* A token at a time is transferred to the DATA step compiler or procedure parser.
* Upon reaching a step boundary, the DATA step processor stops processing the SAS statements and the completed step is compiled, or parsed in the case of a procedure. The compiled step is then passed for execution. - CREATE:
* Definition of Input and Output files, including variable names, their locations and attributes.
* Results in an input buffer (area of memory) is created for each raw data files being input= The header portion of the dataset is defined.
* Creation of the (LPDV): It is used at execution time to hold the current observation and to determine which variables are to be initialised to missing (ITM), at each DATA step
iteration.
* Optimisation of code and passing of information to the I/O Engine Supervisor, which determines the index to be used.
Briefly describe what the DATA Step Processor is.
The DATA Step Processor
* The DATA step is a basic building block of any SAS program
- Used to read in data from a file, perform calculations or manipulations on the data and then
output the observation to a SAS data set. These actions are repeated for each input data record.
In short words, explain the Compilation phase
- COMPILE;
*syntax scan
* SAS source code translation to machine language
* Definition of input and output files - CREATE:
* input buffer (if reading any non-SAS data),
* Program Data Vector (PDV),
* and data set descriptor/ header information
* set variable attributes for output SAS data set
* capture variables to be initialized to missing
List compile-time only statements
⇒ drop, keep, rename
⇒ label
⇒ retain
⇒ length
⇒ format, informat
⇒ attrib
⇒ array
⇒ by
⇒ where
Explain the Execution phase
This is where the data is read into the LPDV, usually one record at a time, and the SAS statements in the
DATA step are executed
1. BEGIN:
* The DATA statement is executed and options on the statement are processed
* An automatic variable N is set up by the SAS supervisor. N increments by 1, each time the DATA statement executes. Thus, the DATA step counts the number of times it loops.
- ASSIGN
* The Assign stage generates an ‘Initialise to Missing’ (ITM) instruction which sets the requisite storage areas in the LPDV to missing. - DATA READ:
* The machine code invokes the I/O Engine Supervisor, which selects the next observation from a data set, or the next raw data record from an external file
* Test to see if the end-of-file marker has been reached.
* If it has then the data set is closed and execution phase is complete. - READ:
* A raw data record is read into the LPDV via the Input Buffer. A SAS observation is read directly into the LPDV.
*Special variables are given their values and control is then passed to the next executable statement, following the read statement - EXECUTE:
* With values in the LPDV, other program statements are executed.
* New values are calculated and existing values are manipulated. - WRITE:
*At the end of the DATA step or when an OUTPUT statement is reached, the machine code instructs the I/O Engine Supervisor to copy the values of variables to be kept, from the LPDV to the output data set(s). - RETURN;
*At the end of the DATA step or when a RETURN statement is reached, the process flow loops back to the top of the DATA step and step 1 begins again.
What is the DATA step debugger in SAS?
The SAS System provides the DATA step debugger as a means of routing out logical errors during the execution phase of a DATA step.
What is the DATA step debugger in SAS?
A means of routing out logical errors during the execution phase of a DATA step. Allowing step by step examination of the DATA step execution
What is the DATA step processor is responsible for?
The compilation and execution of a DATA step is controlled by the DATA step processor. It’s responsible for:
* Checking the syntax of DATA step code;
* Compiling DATA step source code;
Optimising the executable image;
* Executing the resulting machine code;
* Accessing data through the input / output Engine Supervisor.
In summary, what happens during the Execution phase?
The Execution Phase then loops through the DATA Step performing the following operations:
* Increments N by 1 and initially executes the DATA statement, evaluating any data set options;
* Sends an Initialise To Missing instruction which resets all variables which are not retained to missing;
* Checks if there is a record to be read into the LPDV;
* If there is a record, this is then read into the LPDV. If a record is not present and the end-of-file marker is detected, the DATA Step completes execution;
* Further statements are executed sequentially within the DATA step;
* On encountering the RUN statement, or an OUTPUT statement, a record is generated in the output table(s);
* On encountering the RUN statement, or a RETURN statement, the DATA step loops and performs the same instructions again.
What is the difference between a function and a call routine?
They are similar to functions and there is often a function with a call routine of the same name.
For example there is the CATS function and the CALL CATS() routine. However, the main difference is that the call routine cannot be used in an assignment statement.
Define what a function is.
A SAS function is a routine that returns a single value resulting from zero or more arguments passed to that function.
What is the The LOWCASE Function?
The LOWCASE function provides a simple method of making the input character argument lower case.
What is The PROPCASE Function?
The PROPCASE function takes a character argument and capitalises the first character of each word, then lowercases the remaining characters of each word.
What is the The FIND Function?
The FIND function can be used to locate a specified set of characters within a string. It returns the position at which the specified substring first occurs.
* Similar func to INDEX except it allows modifiers and starting pos to be specified.
Target_var=FIND(string,substring<,modifiers><,startpos>);
i.e=.
find(proddesc, “Light”, “i”)
What is the COMPRESS Function?
The COMPRESS function removes specific characters from a string.
Target_var = COMPRESS(str<,chars><,modifiers>);
Define what a function is.
A SAS function is a routine that returns a single value resulting from zero or more arguments passed to that function.