Playing with Data Flashcards
The SUM function can be used within what two statements?
PROC PRINT and DATA
SUM function
SUM(argument1, argument2);
argument = numeric constant, variable, or arithmetic expression
If an argument in a sum function has a missing value, what will be the result of the sum? eg. sum(salary, bonus) where salary is missing value.
The missing value will be read as a zero and the sum will equal to bonus
What’s the difference between:
compensation=sum(salary, bonus)
and
compensation= salary + bonus
Hint: think missing value
The sum function will treat missing value as zero and return a value, while using + when there’s a missing value will return a missing value
what do date functions, like the following, do?
YEAR(SAS-date)
MONTH(SAS-date)
DAY(SAS-date)
WEEKDAY(SAS-date)
QTR(SAS-date)
They return a more familiar value from a SAS date
4 digit year
number 1-12
number 1-31
number 1-7 where 1=sunday
number 1-4
current date functions
TODAY()
DATE()
return current date as a SAS date Value
MDY(month,day,year)
returns SAS Date Value from a specific date
If you drop a variable before a sum function, will the sum function work?
YES, the DROP statement is a compile-only statement. SAS sets a drop flag for the dropped variables, but the variables are in the PDV and are therefore available for processing
IF, THEN statement
IF expression THEN statement;
- expression =* character, numeric, or date constants, character or numeric variables
- statement =* symbols that represent an arithmetic calculation, SAS functions
Why use the ELSE statement rather than several IF statements?
The system would process every single IF statement. The ELSE statement saves processing resources.
ELSE statement
IF expression THEN statement;
ELSE IF expression THEN statement;
What operator can you use if either of 2 conditions must be true to process a statement?
What operator can you use to encompass the “rest” of the observations (multiple) at the end of the IF THEN saga?
OR
ELSE statement;
What can you use if you want to have multiple THEN statements for an IF, THEN condition?
Use a DO group!
IF expression THEN
DO;
- statement1;*
- statement2;*
END;
ELSE IF expression THEN
DO;
- statement1;*
- statement2;*
END;
In the following program, what length is the variable Freq assigned? How can you fix a truncated value?
data work.bonus;
set orion.sales;
if Country=’US’ then
do;
Bonus=500;
Freq=’Once a Year’
end;
else if Country=’AU’ then
do;
Bonus=300;
Freq=’Twice a Year’;
end;
run;
It takes the length of the first assigment statement (i.e. ‘Once a Year’)
Use length statement to fix truncated values (length Freq $ 12;). You could also “pad” the first assignment statement with a blank at the end.
How to you concatenate two data sets (i.e. merge vertically)?
DATA newdataset;
SET SASDataSet1 SASDataSet2….;
RUN;
This will show DataSet1 following by data from DataSet2
How to you concatenate two datasets with differently named variables that hold the same information?
RENAME!
DATA NewDataSet;
SET SASDataSet1
SASDataSet2 (RENAME=(oldname1=newname1 oldname2=newname2));
RUN;
How do you merge two data sets horizontally?
You can merge the two data sets based on a common variable
DATA newdataset;
MERGE SASdataset1 SASdataset2;
BY (descending)variable;
RUN;
Note: Data sets must be sorted by the BY variable
How do determine which data set(s) contributed to values in an observation?
Use IN= Data Set Option
data newdataset;
merge sasdataset1 (IN=tempvariable1) sasdataset2 (IN=tempvariable2);
by commonvariable;
run;
SAS will create a temporary variable with value of 0 if data set did not contribute to obs and value of 1 if it did contribute. These variables will NOT appear in the output data set. They’re only available during execution phase
How do you do a match-merge where you omit non-matching obs between data sets?
Use IN= Data Set Option to see which data set contribute info to each observation. Then use IF statement to select only obs where both data sets contribute:
DATA newdataset;
MERGE sasdataset1 (IN=tempvar1) sasdataset2 (IN=tempvar2);
BY commonvar;
IF tempvar1= 1 AND tempvar2=1;
RUN;