Playing with Data Flashcards
The SUM function can be used within what two statements?
PROC PRINT and DATA
SUM function
SUM(argument1, argument2);
argument = numeric constant, variable, or arithmetic expression
If an argument in a sum function has a missing value, what will be the result of the sum? eg. sum(salary, bonus) where salary is missing value.
The missing value will be read as a zero and the sum will equal to bonus
What’s the difference between:
compensation=sum(salary, bonus)
and
compensation= salary + bonus
Hint: think missing value
The sum function will treat missing value as zero and return a value, while using + when there’s a missing value will return a missing value
what do date functions, like the following, do?
YEAR(SAS-date)
MONTH(SAS-date)
DAY(SAS-date)
WEEKDAY(SAS-date)
QTR(SAS-date)
They return a more familiar value from a SAS date
4 digit year
number 1-12
number 1-31
number 1-7 where 1=sunday
number 1-4
current date functions
TODAY()
DATE()
return current date as a SAS date Value
MDY(month,day,year)
returns SAS Date Value from a specific date
If you drop a variable before a sum function, will the sum function work?
YES, the DROP statement is a compile-only statement. SAS sets a drop flag for the dropped variables, but the variables are in the PDV and are therefore available for processing
IF, THEN statement
IF expression THEN statement;
- expression =* character, numeric, or date constants, character or numeric variables
- statement =* symbols that represent an arithmetic calculation, SAS functions
Why use the ELSE statement rather than several IF statements?
The system would process every single IF statement. The ELSE statement saves processing resources.
ELSE statement
IF expression THEN statement;
ELSE IF expression THEN statement;
What operator can you use if either of 2 conditions must be true to process a statement?
What operator can you use to encompass the “rest” of the observations (multiple) at the end of the IF THEN saga?
OR
ELSE statement;
What can you use if you want to have multiple THEN statements for an IF, THEN condition?
Use a DO group!
IF expression THEN
DO;
- statement1;*
- statement2;*
END;
ELSE IF expression THEN
DO;
- statement1;*
- statement2;*
END;
In the following program, what length is the variable Freq assigned? How can you fix a truncated value?
data work.bonus;
set orion.sales;
if Country=’US’ then
do;
Bonus=500;
Freq=’Once a Year’
end;
else if Country=’AU’ then
do;
Bonus=300;
Freq=’Twice a Year’;
end;
run;
It takes the length of the first assigment statement (i.e. ‘Once a Year’)
Use length statement to fix truncated values (length Freq $ 12;). You could also “pad” the first assignment statement with a blank at the end.
How to you concatenate two data sets (i.e. merge vertically)?
DATA newdataset;
SET SASDataSet1 SASDataSet2….;
RUN;
This will show DataSet1 following by data from DataSet2
How to you concatenate two datasets with differently named variables that hold the same information?
RENAME!
DATA NewDataSet;
SET SASDataSet1
SASDataSet2 (RENAME=(oldname1=newname1 oldname2=newname2));
RUN;
How do you merge two data sets horizontally?
You can merge the two data sets based on a common variable
DATA newdataset;
MERGE SASdataset1 SASdataset2;
BY (descending)variable;
RUN;
Note: Data sets must be sorted by the BY variable
How do determine which data set(s) contributed to values in an observation?
Use IN= Data Set Option
data newdataset;
merge sasdataset1 (IN=tempvariable1) sasdataset2 (IN=tempvariable2);
by commonvariable;
run;
SAS will create a temporary variable with value of 0 if data set did not contribute to obs and value of 1 if it did contribute. These variables will NOT appear in the output data set. They’re only available during execution phase
How do you do a match-merge where you omit non-matching obs between data sets?
Use IN= Data Set Option to see which data set contribute info to each observation. Then use IF statement to select only obs where both data sets contribute:
DATA newdataset;
MERGE sasdataset1 (IN=tempvar1) sasdataset2 (IN=tempvar2);
BY commonvar;
IF tempvar1= 1 AND tempvar2=1;
RUN;
Alternate syntax for IF statement to select data sets contributing/not contributing data to an observation
IF tempvar1=1 and tempvar2=1 = IF tempvar1 and tempvar2
IF tempvar1=0 or tempvar2=0 = IF not tempvar1 or not tempvar2
How do you count the frequency of certain observations?
use PROC FREQ
PROC FREQ data=sasdataset;
TABLES variable1 variable2;
RUN;
If you omit the table statement, SAS produces a one-way frequency table for every variable in the data set.
Note: PROC FREQ outputs automatically - doesn’t need proc print step
PROC FREQ automatically displays frequency, percent, cumulative frequency, cumulative percent. How do you suppress some of these statistics?
PROC FREQ data=sasdataset;
TABLES variable1/option1 option2;
RUN;
options:
nocum => suppresses both cumulative frequency and percent
nopercent => supresses both percent and cumulative percent
If do both, the table will only display frequencies
How do you make a frequency table of a continuous variable?
Use PROC FORMAT to define ranges
proc format;
value tiers low-25000=’Tier1’
25000<-50000=’Tier2’;
run;
proc freq data=sasdataset;
tables Salary;
format Salary tiers.;
run;
How do you make a frequency tables across 2 variables? E.g. determine the number of male and female employess in each the US and Australia
PROC FREQ data=sasdataset;
TABLES Gender;
BY Country;
RUN;
(Remember to sort by the “by” variable first)
This produces two tables - one for gender frequency in the US and one for gender frequency in Australia.
how do you make a crosstabulation table?
Use * between variables that want to crosstabulate
proc freq data=sasdataset;
tables gender*country;
run;
What are two display options for a crosstabulation table other than the default?
list and crosslist
Using proc freq, how do you arrange the frequency in the table in descending order?
add order=freq;
proc freq data=orion.nonsales2 order=freq;
tables Employee_ID/nocum nopercent;
run;
How do you make a table that displays the levels, or distinct values, for each variable.
use nlevels
proc freq data=orion.nonsales2 nlevels;
tables Gender Country Employee_ID/nocum nopercent;
run;
Useful to find duplicate values
PROC MEANS statement
proc means data=Data-Set <statistics>;</statistics>
var variable1 variable 2;
run;
Using PROC MEANS, how do you create descriptive statististics for different groups (e.g. male vs female salaries)?
Use CLASS statement
proc means data=Data-Set <statistics>;</statistics>
var variable1 variable2;
class classification-variable(s);
run;
How do you display the number of missing values using PROC MEANS?
use NMISS
proc means data=Data-Set nmiss;
var variable1;
run;
Which procedure is helpful to detect outliers?
PROC UNIVARIATE
PROC UNIVARIATE statement
proc univariate data=Data-Set;
var variable1;
id variable2;
run;
Using ID, makes it display the corresponding value of that variable in the extreme obs table.
SAS Output Delivery System statement (used to make file types outside SAS)
ODS filetype options ;
ODS filetype CLOSE;
e.g.
ODS pdf FILE= ‘filename.pdf’;
ODS pdf CLOSE;
A cross-tabulation table shows Frequency, Percent, Row Pct, and Col Pct. How do you suppress some of these options?
proc freq data=dataset;
tables variable1*variable2/(options);
run;
OPTIONS:
nofreq suppresses frequency
no row suppresses row percentages
no col suppresses column percentages
nopercent suppreses percent
In PROC MEANS, how do you display only a subset of the default statistics or display them in a different order?
proc means data=dataset stat1 stat2 stat3;
var var1;
run;
OPTIONS: sum mean n stddev range min max sum nobs (and others)
Difference between N and NOBS (in proc means)?
N = number of nonmissing obs
NOBS = total number of obs
How do you control the number of decimal places that appear in PROC MEANS output?
use MAXDEC=
proc means data=dataset maxdec=#;
var var1;
run;
Which ODS destination is open by default?
HTML
Most other ODS destinations are closed by default, meaning you need to open them at the beginning of the program and close them at the end.
Which keyword can you use to close more than one open ODS destination?
_ALL_
ODS rtf file=’filename’;
ODS pdf file=’filename’;
proc print..
ODS _ALL_ CLOSE;
General form of the ODS HTML statement when you want to specify options
ODS HTML BODY=file-specification;
ODS HTML CLOSE;
file-specification = identifies the file that contains the HTML output.
*Can also use FILE= instead of BODY=
What does PROC MEANS print by default (without adding any options)?
N, mean, std dev, min, max
What is the default format for numeric values in the ouput of a PROC MEANS step?
BESTw.
What is the difference between BY and CLASS statements (used in a PROC MEANS step)?
- BY requires that the data be sorted
- BY produces separate tables for each BY group. CLASS produces only 1 table.
How do you create an ouput SAS data set from the ouput in a PROC MEANS step?
Use OUTPUT statement in PROC MEANS
OUTPUT OUT=dataset statistic=variable(s);
dataset = new dataset name
statistic = statistics you want to print
variables = names of the variables to create
e.g.
proc means data=dataset1;
var age height weight;
class sex;
output out=dataset2
mean=AvgAge AvgHeight AvgWeight
min=MinAge MinHeight MinWeight
run;
Note: variable names must be in same order as var variables
How do you suppress the output from a PROC MEANS statement?
use NOPRINT
proc means data=dataset NOPRINT;
var variable1
run;
Difference between PROC MEANS and PROC SUMMARY
You can get the same output, but PROC MEANS produces a report by default. With PROC SUMMARY you need to add a PRINT option to produce the report.
PROC SUMMARY data=dataset PRINT;
var var1 var2;
class var3;
output out=dataset2
mean= AvgVar1 AvgVar2;
run;
In a crosstabulation table made using PROC FREQ, which variable forms the rows and which forms the columns?
in the tables statement:
tables variable1*variable2
variable1 will form the table rows and variable2 will form the columns
IN PROC FREQ, where do you specify the list and crosslist options?
TABLES var1 var2 / list
TABLES var1 var2 / crosslist