Playing with Data Flashcards

1
Q

The SUM function can be used within what two statements?

A

PROC PRINT and DATA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

SUM function

A

SUM(argument1, argument2);

argument = numeric constant, variable, or arithmetic expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If an argument in a sum function has a missing value, what will be the result of the sum? eg. sum(salary, bonus) where salary is missing value.

A

The missing value will be read as a zero and the sum will equal to bonus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s the difference between:

compensation=sum(salary, bonus)

and

compensation= salary + bonus

Hint: think missing value

A

The sum function will treat missing value as zero and return a value, while using + when there’s a missing value will return a missing value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what do date functions, like the following, do?

YEAR(SAS-date)

MONTH(SAS-date)

DAY(SAS-date)

WEEKDAY(SAS-date)

QTR(SAS-date)

A

They return a more familiar value from a SAS date

4 digit year

number 1-12

number 1-31

number 1-7 where 1=sunday

number 1-4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

current date functions

A

TODAY()

DATE()

return current date as a SAS date Value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

MDY(month,day,year)

A

returns SAS Date Value from a specific date

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If you drop a variable before a sum function, will the sum function work?

A

YES, the DROP statement is a compile-only statement. SAS sets a drop flag for the dropped variables, but the variables are in the PDV and are therefore available for processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

IF, THEN statement

A

IF expression THEN statement;

  • expression =* character, numeric, or date constants, character or numeric variables
  • statement =* symbols that represent an arithmetic calculation, SAS functions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why use the ELSE statement rather than several IF statements?

A

The system would process every single IF statement. The ELSE statement saves processing resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

ELSE statement

A

IF expression THEN statement;

ELSE IF expression THEN statement;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What operator can you use if either of 2 conditions must be true to process a statement?

What operator can you use to encompass the “rest” of the observations (multiple) at the end of the IF THEN saga?

A

OR

ELSE statement;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What can you use if you want to have multiple THEN statements for an IF, THEN condition?

A

Use a DO group!

IF expression THEN

DO;

  • statement1;*
  • statement2;*

END;

ELSE IF expression THEN

DO;

  • statement1;*
  • statement2;*

END;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In the following program, what length is the variable Freq assigned? How can you fix a truncated value?

data work.bonus;

set orion.sales;

if Country=’US’ then

do;

Bonus=500;

Freq=’Once a Year’

end;

else if Country=’AU’ then

do;

Bonus=300;

Freq=’Twice a Year’;

end;

run;

A

It takes the length of the first assigment statement (i.e. ‘Once a Year’)

Use length statement to fix truncated values (length Freq $ 12;). You could also “pad” the first assignment statement with a blank at the end.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to you concatenate two data sets (i.e. merge vertically)?

A

DATA newdataset;

SET SASDataSet1 SASDataSet2….;

RUN;

This will show DataSet1 following by data from DataSet2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to you concatenate two datasets with differently named variables that hold the same information?

A

RENAME!

DATA NewDataSet;

SET SASDataSet1

SASDataSet2 (RENAME=(oldname1=newname1 oldname2=newname2));

RUN;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you merge two data sets horizontally?

A

You can merge the two data sets based on a common variable

DATA newdataset;

MERGE SASdataset1 SASdataset2;

BY (descending)variable;

RUN;

Note: Data sets must be sorted by the BY variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do determine which data set(s) contributed to values in an observation?

A

Use IN= Data Set Option

data newdataset;

merge sasdataset1 (IN=tempvariable1) sasdataset2 (IN=tempvariable2);

by commonvariable;

run;

SAS will create a temporary variable with value of 0 if data set did not contribute to obs and value of 1 if it did contribute. These variables will NOT appear in the output data set. They’re only available during execution phase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you do a match-merge where you omit non-matching obs between data sets?

A

Use IN= Data Set Option to see which data set contribute info to each observation. Then use IF statement to select only obs where both data sets contribute:

DATA newdataset;

MERGE sasdataset1 (IN=tempvar1) sasdataset2 (IN=tempvar2);

BY commonvar;

IF tempvar1= 1 AND tempvar2=1;

RUN;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Alternate syntax for IF statement to select data sets contributing/not contributing data to an observation

A

IF tempvar1=1 and tempvar2=1 = IF tempvar1 and tempvar2

IF tempvar1=0 or tempvar2=0 = IF not tempvar1 or not tempvar2

21
Q

How do you count the frequency of certain observations?

A

use PROC FREQ

PROC FREQ data=sasdataset;

TABLES variable1 variable2;

RUN;

If you omit the table statement, SAS produces a one-way frequency table for every variable in the data set.

Note: PROC FREQ outputs automatically - doesn’t need proc print step

22
Q

PROC FREQ automatically displays frequency, percent, cumulative frequency, cumulative percent. How do you suppress some of these statistics?

A

PROC FREQ data=sasdataset;

TABLES variable1/option1 option2;

RUN;

options:

nocum => suppresses both cumulative frequency and percent

nopercent => supresses both percent and cumulative percent

If do both, the table will only display frequencies

23
Q

How do you make a frequency table of a continuous variable?

A

Use PROC FORMAT to define ranges

proc format;

value tiers low-25000=’Tier1’

25000<-50000=’Tier2’;

run;

proc freq data=sasdataset;

tables Salary;

format Salary tiers.;

run;

24
Q

How do you make a frequency tables across 2 variables? E.g. determine the number of male and female employess in each the US and Australia

A

PROC FREQ data=sasdataset;

TABLES Gender;

BY Country;

RUN;

(Remember to sort by the “by” variable first)

This produces two tables - one for gender frequency in the US and one for gender frequency in Australia.

25
Q

how do you make a crosstabulation table?

A

Use * between variables that want to crosstabulate

proc freq data=sasdataset;

tables gender*country;

run;

26
Q

What are two display options for a crosstabulation table other than the default?

A

list and crosslist

27
Q

Using proc freq, how do you arrange the frequency in the table in descending order?

A

add order=freq;

proc freq data=orion.nonsales2 order=freq;

tables Employee_ID/nocum nopercent;

run;

28
Q

How do you make a table that displays the levels, or distinct values, for each variable.

A

use nlevels

proc freq data=orion.nonsales2 nlevels;

tables Gender Country Employee_ID/nocum nopercent;

run;

Useful to find duplicate values

29
Q

PROC MEANS statement

A

proc means data=Data-Set <statistics>;</statistics>

var variable1 variable 2;

run;

30
Q

Using PROC MEANS, how do you create descriptive statististics for different groups (e.g. male vs female salaries)?

A

Use CLASS statement

proc means data=Data-Set <statistics>;</statistics>

var variable1 variable2;

class classification-variable(s);

run;

31
Q

How do you display the number of missing values using PROC MEANS?

A

use NMISS

proc means data=Data-Set nmiss;

var variable1;

run;

32
Q

Which procedure is helpful to detect outliers?

A

PROC UNIVARIATE

33
Q

PROC UNIVARIATE statement

A

proc univariate data=Data-Set;

var variable1;

id variable2;

run;

Using ID, makes it display the corresponding value of that variable in the extreme obs table.

34
Q

SAS Output Delivery System statement (used to make file types outside SAS)

A

ODS filetype options ;

ODS filetype CLOSE;

e.g.

ODS pdf FILE= ‘filename.pdf’;

ODS pdf CLOSE;

35
Q

A cross-tabulation table shows Frequency, Percent, Row Pct, and Col Pct. How do you suppress some of these options?

A

proc freq data=dataset;

tables variable1*variable2/(options);

run;

OPTIONS:

nofreq suppresses frequency

no row suppresses row percentages

no col suppresses column percentages

nopercent suppreses percent

36
Q

In PROC MEANS, how do you display only a subset of the default statistics or display them in a different order?

A

proc means data=dataset stat1 stat2 stat3;

var var1;

run;

OPTIONS: sum mean n stddev range min max sum nobs (and others)

37
Q

Difference between N and NOBS (in proc means)?

A

N = number of nonmissing obs

NOBS = total number of obs

38
Q

How do you control the number of decimal places that appear in PROC MEANS output?

A

use MAXDEC=

proc means data=dataset maxdec=#;

var var1;

run;

39
Q

Which ODS destination is open by default?

A

HTML

Most other ODS destinations are closed by default, meaning you need to open them at the beginning of the program and close them at the end.

40
Q

Which keyword can you use to close more than one open ODS destination?

A

_ALL_

ODS rtf file=’filename’;

ODS pdf file=’filename’;

proc print..

ODS _ALL_ CLOSE;

41
Q

General form of the ODS HTML statement when you want to specify options

A

ODS HTML BODY=file-specification;

ODS HTML CLOSE;

file-specification = identifies the file that contains the HTML output.

*Can also use FILE= instead of BODY=

42
Q

What does PROC MEANS print by default (without adding any options)?

A

N, mean, std dev, min, max

43
Q

What is the default format for numeric values in the ouput of a PROC MEANS step?

A

BESTw.

44
Q

What is the difference between BY and CLASS statements (used in a PROC MEANS step)?

A
  • BY requires that the data be sorted
  • BY produces separate tables for each BY group. CLASS produces only 1 table.
45
Q

How do you create an ouput SAS data set from the ouput in a PROC MEANS step?

A

Use OUTPUT statement in PROC MEANS

OUTPUT OUT=dataset statistic=variable(s);

dataset = new dataset name

statistic = statistics you want to print

variables = names of the variables to create

e.g.

proc means data=dataset1;

var age height weight;

class sex;

output out=dataset2

mean=AvgAge AvgHeight AvgWeight

min=MinAge MinHeight MinWeight

run;

Note: variable names must be in same order as var variables

46
Q

How do you suppress the output from a PROC MEANS statement?

A

use NOPRINT

proc means data=dataset NOPRINT;

var variable1

run;

47
Q

Difference between PROC MEANS and PROC SUMMARY

A

You can get the same output, but PROC MEANS produces a report by default. With PROC SUMMARY you need to add a PRINT option to produce the report.

PROC SUMMARY data=dataset PRINT;

var var1 var2;

class var3;

output out=dataset2

mean= AvgVar1 AvgVar2;

run;

48
Q

In a crosstabulation table made using PROC FREQ, which variable forms the rows and which forms the columns?

A

in the tables statement:

tables variable1*variable2

variable1 will form the table rows and variable2 will form the columns

49
Q

IN PROC FREQ, where do you specify the list and crosslist options?

A

TABLES var1 var2 / list

TABLES var1 var2 / crosslist