Programa P1-3 Flashcards

Question

What is the TRANSLATE function?

Answer 1

The TRANSLATE function is used to replace specific characters within a string. Target_var = TRANSLATE(str, to, from ; e.g **data _null_; length first_15 $ 15 last_4 $ 4 showcard $ 19; actcard"0987/6543/2100/4456"; first_15=substr(actcard,1,15); last-4=subsr(actcard,16,4); first_15=translate(first_15, "**********","0123456789/"); showcard=cat(first_15,last_15); put showcard= run;**

Answer 2

The TRANWRD function removes or replaces all occurrences of a string pattern within a string. Target_var = TRANWRD(str, from ,to); **data correction; length new_email new_postcode $ 50;; set pdata.customer(keep=email postown); new_email=transwrd(email, "hotmail.co.uk", "hotmail.com") new_postcode=tranwrd(postcode,"not central", ""); run; proc print data=correction; var email new_email postcode new_postcode; where email contains "hotmail.co.uk"; run;** *The default length of the variable returned from the TRANWRD function is 200 bytes

Answer 3

These functions are used to return the length of character strings, but have subtle differences, which are summarised below: Function Trailing Blanks Length if string is blank * LENGTH: Trailing blanks Excluded -Len if string is blank 1 * LENGTHN Trailing blanks Excluded -Len if string is 0 * LENGTHC: Trailing blanks-Len if string is 1

Answer 4

The MISSING function checks for a missing value within its argument and returns a 1 if it detects a missing value, otherwise it returns a 0. syntax: missing(argument) eg. where missing(overtime)=1;

Answer 5

The function creates a queue (in memory), which stores the value of the variable from previous observations. Multiple LAG functions can be used, each maintaining a separate queue. example; ** data epi_summary; set pdata.epidemic_summary; diff= weektot-lag(weektot); pct_diff=diff/lag(weektot); run; proc print data=epi_summary; format pct_diff percent8.2; run; **

Answer 6

* Issue *:The LAG function will only remember the value before the lag function was ran. * solution *: By running the lag function at the beginning of the DATA step, ensuring that none of them are within conditional blocks or statements. The LAG function will recall every previous value.

Answer 7

When using FIRST. and LAST. BY group processing. The reason being, that it will only set the LAG value at the beginning or end of the BY group, depending on how it is coded .e.g.; ** proc sort data=pdata.orders out=orders; by orddate; run; data order_summary (keep= orddate totord ordchange); set orders; by orddate; if first.orddate then totord=0; totord+ordprice; if last.orddate then do; prev_totord=lag(totord); ordchange=totord-prev_totord; output; end; run; **

Answer 8

Proc FORMAT is used to create user defined formats in the SAS system

Answer 9

For reporting purposes, missing character and numeric values can be handled using formats. ** proc format; value missing_num_fmt .="Unknown"; value $missing_char_fmt ""="Unknown"; run; proc print data=pdata.results; format age missing_num_fmt. class $missing_char_fmt.; run; ** * The MISSING system option can also be used to display missing numeric values differently. options missing=X; ** proc print data=pdata.results; format age missing_num_fmt. class $missing_char_fmt.; run; options missing=. **

Answer 10

1. The comma syntax is used in situations where a variable contains multiple coded values that are non-consecutive 2. Hyphens in between the range. 3. A combinations of commas and hyphens can be used to define a series of values that are collectively covered within a single format label.

Answer 11

To create or modify a format definition using a data set, specify the CNTLIN= option on the PROC FORMAT statement. Note that the data set must contain certain key columns such as FMTNAME, START, END (with numeric formats), LABEL and TYPE. e.g.; ** data prods (keep=fmtname type start label); set pdata.products(rename=(prodno=start proddesc=label)); fmtname="prodfmt"; type="C"; run; proc format cntlin=prods; run; proc print data=pdata.orders; var orddate ordno prodno ordprice; format orddate date9. prodno $prodfmt.; run; **

Answer 12

User-defined Formats are often used to add columns to a data set. data prods (keep=orddate ordno quantity price prodno proddesc); set pdata.orders; proddesc=put(prodno, $prodfmt.); run;

Answer 13

* The ability to output the definition of one or more formats to a data set. * This provides an effective way of modifying a format definition. * Exporting the definition is done by using the CNTLOUT= option on the Proc FORMAT statement.

Answer 14

* Picture formats that display numeric values and date values in a predefined template using PROC FORMAT. * This can be particularly useful for including currency symbols and making large numbers more readable. e.g. data overtime; set pdata.employee(where=(overtime ne .) keep=empno overtime budget); otbalnce=budget-overtime; label otbalnce="Balance"; run; proc print data=overtime label; format otbalnce debcred.; run;

Answer 15

Using the DATATYPE= option, date, time or datetime formats can be created. ** proc format; picture dmy (default=10) other="%d-%m %Y" (datatype=date); run; proc print data=pdata.employee noobs label; var empno dob date_joined; format dob date_joined dmy.; run; **

Answer 16

* The user must specify the LIBRARY= option on the Proc FORMAT statement i.e ** proc format library=pdata; ** * To access the permanently stored format, specify the FMTSEARCH= Global System option i.e. options fmtsearch=(pdata) * The FMTSEARCH option can be used to list more than one library. The order the format libraries are entered will determine the order in which they are searched

Answer 17

Wherever a data set name is used in a SAS program, data set options can be included. Some are used to override SAS system option values for a particular data set, but many are of use in a data management context. * Data set options are enclosed in parentheses after the name of the data set that they apply to.

Answer 18

The DROP, KEEP, RENAME and WHERE options are always executed in alphabetical order.

Answer 19

BETWEEN-AND * Selects rows where the value of a variable lies within a range of values. This includes the range boundary values. CONTAINS (?) * Selects rows where a specified string of characters exists within a character value. The position is irrelevant, but it is case sensitive. “?” is a synonym for “CONTAINS”. LIKE * Selects rows where the values of a character variable match a specified pattern. The pattern is defined using the percent and underscore characters, where % allows any number of characters in that position, whilst the underscore denotes any single character in that position. IS MISSING or IS NULL * Selects rows where the value of the specified variable is missing or null.

Answer 20

Wildcard operators can be used with data set options such as DROP and KEEP in order to list ranges or groups of variables. e.g. * Hyphen wildcard: Lists a continuous range of variables that have a common prefix and a numeric suffix * Double hyphen: Lists a continuous range of variables by specifying the name variable at the start of the range and the name of the variable at the end of the range. Based on the order of columns * Colon wildcard is used to specify all variables that begin with a particular prefix * The _NUMERIC_ wildcard is used to refer to all numeric variables within a table. * The _CHARACTER_ wildcard is used to refer to all character variables within a table.

Answer 21

The number of times the loop will iterate is fixed, even in cases where the start, stop and increment values are determined by variable values.

Answer 22

Nesting DO loops within DO loops is very useful when factors need to be varied at different rates relative to one another

Answer 23

Bounded and nested loops have a fixed number of iterations defined by their start, stop and by values. Suppose that it’s not known how many times the loop executes and the requirement is for the loop to continue until a condition is met, or while a condition remains true. This requires the use the DO WHILE or DO UNTIL structures * DO WHILE: The loop continues while the condition is true. - Expression is evaluated at the top of the loop so the code in the loop is not necessarily executed at all. * DO UNTIL: The loop executes until the condition becomes true - Evaluated at the bottom of the loop, so the code within the loop must be executed at least once

Answer 24

An array is a simple way of referring to many variables by a single name. * Arrays can only be done in a DATA step and are a COMPILE only step *The only reference ONLY Numeric variables or ONLY character variables ARRAY arrayname {n} $ length array-elements (initial-values);

Answer 25

The OBS= data set option is used to specify the number of the last observation to be processed.

Answer 26

* Create similar variables; * Help read certain data structures; * Repeat actions for variables; * Perform table look-ups. * The array will create variables in the LPDV if they do not already exist

Answer 27

* The collection of SELECT, WHEN and OTHERWISE statements is known as a SELECT group, which must be closed using an END statement * SELECT groups are an alternative to using IF/THEN/ELSE statements, but are generally considered to be a better choice when there are multiple conditions to evaluate, as the code is easier to read. * SELECT groups represent a more defensive way of evaluating conditions, as they force the use of the OTHERWISE clause where there is no match on any of the WHEN statements.

Answer 28

* SELECT statements are an effective way of evaluating a group of mutually exclusive conditions, which are based on a common expression. Each condition is specified using a WHEN statement, which defines one or more SAS statements that are only executed when the condition is true. * SELECT statements can be followed by one or more WHEN statements and can optionally include an OTHERWISE statement to handle the cases not catered for by the WHEN expressions.

Answer 29

* This function returns the number of elements in an array. The syntax is: num = dim(arrayname) ; * It is often used as the upper bound of a DO loop and avoids having to re-code the loop when the number of elements in the array changes.

Answer 30

A SAS function is a routine that returns a single value resulting from zero or more arguments passed to that function.

Answer 31

They are similar to functions and there is often a function with a call routine of the same name. For example there is the CATS function and the CALL CATS() routine. However, the main difference is that the call routine cannot be used in an assignment statement.

Answer 32

Arrays provide a convenient way of processing a group of variables. Functions such as SUM, MEAN etc. can be used with arrays, where the arguments for the function take the form: variable=function(OF variable-list); * Processing Arrays with Functions where the variable list can be the name of an array or a list of array elements *

Answer 33

They are similar to functions and there is often a function with a call routine of the same name. For example there is the CATS function and the CALL CATS() routine. However, the main difference is that the call routine cannot be used in an assignment statement.

Answer 34

A SAS function is a routine that returns a single value resulting from zero or more arguments passed to that function.

Answer 35

Arrays provide a convenient way of processing a group of variables. Functions such as SUM, MEAN etc. can be used with arrays, where the arguments for the function take the form: variable=function(OF variable-list); * Processing Arrays with Functions where the variable list can be the name of an array or a list of array elements *

Programa P1-3 Flashcards

Data step internals Data Handling Data step processing (59 cards)