SAS Flashcards

1
Q

Code to import data

A
data work.datasetname; 
input age weight height; 
label age = 'Patient Age'; 
cards; 
24 130 65
30 150 70; 
run;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

printing data

A

proc print run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

UNIVARIATE procedure does what

A

for each variable prints summary statistics
extreme observations, stem and leaf, basic stats (mean/median/mode/deviation/stdev/range/IQR), quartiles, t-test, sign, signed rank
USE TO EXPLORE NEW DATASET

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

proc UNIVARIATE code

A
proc univariate data = setName plot; (plot not necess.)
var weight; 
histogram weight; (not necessary)
Title1='Age study'; 
run;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what does the CORR procedure do

A

shows simple statistics (N of each group, mean, standard deviation, max, label)
Correlations between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

proc CORR code

A

proc corr data = work.setName;
var age height;
run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

code to create new dataset

A

data work.newSet;
set oldSet;
run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

create new dataset and add new variable and fill it (code)

A
data work.clenedSet; 
set oldSet; 
bp = .; 
IF x = 6 THEN bp = 1; 
IF x = 1 OR x = 2 OR x = 3 OR x = 4 OR x = 5 THEN bp = 0; 
run;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what does cross-tabulation with the FREQ procedure do?

A

shows two tables: one way freq and cross tabulations of two variables
can see who used what treatment
percentages
frequency tables for variables in analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

code to show frequencies of dataset proc FREQ

A

proc freq data = setName;
tables age age*weight;
run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what to add to FREQ procedure to see how many missing values

A

tables age age*weight/MISSING;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

proc TTEST code

A
proc ttest data = setName; 
class group; (groups we want to compare)
var height; (compare groups on this variable)
run;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does proc TTEST do to missing values

A

excludes them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

proc TTEST equality of variance results

A

if Folded F >0.05 assume equal variances, else say variances unequal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what test to use when unequal variances for proc TTEST

A

Satterthwaite

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What test to use when equal variances TTEST

A

Pooled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

proc TTEST if P

A

Reject null and say difference in heigh between groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

proc TTEST F-test null and alternative

A

Null is equal variances, alternative is unequal variances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How to use Cochran with proc TTEST

A

proc ttest data = setName COCHRAN;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When to use cochran

A

produces p-value for unequal variances

if folded f

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How to compare our mean height to mean value under the null of 60 (code)

A

proc ttest data setName H0=60;
var height;
run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

how to check normality of data

A

proc univariate data = work.setName plot;
var height;
histogram height;
run;

shows box plot, histogram, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

how to do before and after TTEST

A

proc ttest data = setName;
paired before*after
run;

null is before-after=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

ANOVA code

A
proc anova data = work.setName;
class food; (7 different groups ate diff food)
model height = food; (compare heights of people who ate different food)
run;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

ANOVA F-statistic calculation

A

variance between groups/variance between groups
should be near 1 if null correct
F statistic 6.67 suggests difference in means of groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

When to use Tukey

A

see which anova means are different.

multiple comparison

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Tukey code

A

proc anova data = work.setName;
class food; (7 different groups ate diff food)
model height = food; (compare heights of people who ate different food)
means food/tukey;
run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

how to interpret Turkey output

A

comparisons significant at 0.05 indicated by **. those groups different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Why use ANOVA over t-test

A

faster. running many t-tests will increase chances that results are shown by chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

recode to put heights into 4 groups

A

data work.heightsgrouped;
set work.heights;
group=.; (initiate variable and handle missingness)
if height >= 10 AND height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Write ANOVA code to test effect of age on group

A
proc anova data = work.setname; 
class group; 
model age = group; 
run;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

what kind of comparisons do we do in anova

A

one continuous variable (age) to one categorical variable (group)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

where do we write continuous and categorical in anova

A
continuous always on let and categorical on right. 
class will alway be categorical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

anova using proc GLM code

A
proc glm data = work.setname; 
class group; 
model age = group
run; 

same out put but more options for GLM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

how to sort data in increasing order (Code)

A

proc sort data = work.setname;
by group;
run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

how to create box plot for data (code)

A

proc boxplot data work.setname;
plit age*group;
run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

how to interpret box plot

A

horizontal line is median
plus is the mean
top and bottom are 1st and 3rd quartiles
center shaded box made of 50% of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

how do we get all the attributes of our dataset and their characteristics (code)

A

proc contents data = work.dataset;

run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

what does CONTENTS procedure display

A

alphabetic list of variables and attributes
variable name, type, length, etc
length of variable important if merging datasets
number is the position in the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

how to correlate things together

A

the CORR procedure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

the CORR procedure code

A

proc corr data = dataset;
var carb ener etohn fat sugaraw sugaref;
with lexpectbirth;
run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

what does CORR procedure output

A

simple statistics of each variable listed
pearson correlation coefficients
top number is correlation and bottom number is p-value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

how to create correlation matrix with all the variables (code)

A

proc corr data = setname;
var age height weight blah blah3 blah2;
run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

why would we do a correlation matrix

A

to find out which predictors may be informative when predicting response
understand how predictors are related because that affects how they jointly model the response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

How to make scatterplot (code)

A
ods graphics on; 
proc gplot data = work.set; 
plot age*hegiht; 
run; 
ods graphics off;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

reg procedure code

A

proc reg data = work.set;
model lebirth=ener;
run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

how to structure the model statement of REG procedure

A

first variable is response and right variable is predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

what is root MSE

A

provided by REG procedure

estimate of standard deviation of the Y (response variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

what is R-square

A

percent of the variation that the model explains

__% of total variation is explained by the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

what does it mean when regression intercept 27

A

When energy is 0, life expectancy after birth is 27

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

what does it mean when slope is 0.014

A

for every unit increase in energy consumption, life expectancy after birth goes up 0.01

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

what is null hypothesis

proc reg data = work.set;
model lebirth=ener;
run;

A

energy consumption not related to life expectancy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

how to transform data to look at squared life expectancy? (code)

A

data work nurtirion2; set work.nutrition;
le2= lebirth*lebirth;
run;
proc reg data = work.nutrition2; model le2=ener; run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Why squaere data

A

logarithmic trend taken away - more linear now

assume linear relationship so if look at graph want to see linearityy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

what if transformed data and root MSE went up and R square didnt improve

A

not best transformation

when do transformation hope to improve linearity and our explanation of the variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

How to look if multicollinearity may exist (code)

A
ODS graphics on;
PROC CORR DATA=work.nutrition nomiss plots=matrix(histogram);
VAR carbs ener etohn fat sugraw sugref;
RUN;
ODS graphics off;
57
Q

if put 2 response variables in model that are correlated with each other

A

will get errors and hard to decipher which variable is trying to give you which info

58
Q

how to decide which variable not to use

A

context of model (study obesity more sense to include BMI than weight), include 1, run model, see fit, try another

59
Q

bad model if can explain a variable throguh

A

linear combo

60
Q

if had model of solid predictors DF would say

A

1

otherwise says 0 or B

61
Q

code for backward selection

A

in model line at end add ‘/SELECTION = BACKWARDS

62
Q

what is backward selection

A

put all variables in, consider one and if doesn’t add any info to model remove
keep going backward until finds model that gives most info
removes variable that’s least significant (contributing less)

63
Q

what is forward selection

A

start with no variables and consider one that adds most info
moves on until no info added after adding variable

64
Q

what is stepwise election

A

biased. forward, add next var (added 2), steps back and asks if addition of 2nd makes 1st less significant
at each step consider removing or adding variable
can make diff decision at each step

65
Q

what does selection kick off before the start

A

linear things

66
Q

which things are kicked off models using section first

A

non-significant p value

67
Q

default significance for backward selection

A

0.1

68
Q

default significance of stepwise selection

A

0.15

69
Q

if data miner use ____ selection because want best model

A

stepwise

70
Q

what complexity model do we want

A

simpler

71
Q

how to see residuals

A

run model with ods graphics

72
Q

what should we not see from residuals

A

trend between residuals and fitted values or residuals and any variable

73
Q

residual vs. predicted values

A

want random - no patterns

74
Q

standardized residuals

A

expect 95% between +-2

75
Q

leverage

A

what’lll happen if pull that observation out of model.

points with high leverage = big infliuence on the line

76
Q

Q-Q plot

A

quartiles vs. residuals

77
Q

how do we expect residuals to be distributed

A

normally

78
Q

how do we transform age square

A

data agetransform; set age; age2= age**2; run;

double star is power

79
Q

why do proc means

A

exploratory analysis

tells how many people in which category

80
Q

how to perform exploratory analysis on age categories and a dichotomous variable

A
proc meas data = name; 
class age; 
var dichotomous; 
run;
81
Q
PROC MEANS DATA=ear;
	class antibo;
	var clear1;
RUN;

what does output mean (N obs, N, mean)

A

N Obs: How many people were on that antibiotic
N: How many had first ear problem
Mean: Percentage that had a recovery in the 14 day period

82
Q

how to look at model of antibiotic one vs antibiotic 2 as reference (compare from 1 to 2)

A

PROC LOGISTIC DATA=ear DESCENDING;
CLASS antibo;
MODEL clear1 = antibo / LACKFIT;
run;

83
Q

PROC LOGISTIC DATA=ear DESCENDING;
CLASS antibo;
MODEL clear1 = antibo / LACKFIT;
run;

how to interpret OR 2.247

A

OR of 1 vs 2

Odds of recovery are 2.247 times greater for antibiotic 1 vs antibiotic 2

84
Q

how to check if OR significant

A

make sure CI doesnt contain 1

look at p values

85
Q

PROC LOGISTIC DATA=ear DESCENDING;
CLASS antibo;
MODEL clear1 = antibo / LACKFIT;
run;

what if don’t put class variable

A

SAS looks at it as continuous variable

ALWYAS USE CLASS STATEMENT IN LOG REGRESSION TO MAKE MORE INTERPRETABLE

86
Q

Odds = (write out standard model example)

A

e^-1.864 (constant) (e^1stBvalue)^X1 (e^2ndBvalue)^2 etc

87
Q

e^-1.864 (constant) (e^1stBvalue)^X1 (e^2ndBvalue)^2 etc

every unit increase in X increases odds of Y being 1 by

A

e^b

88
Q

male e^B = 2.454

intrepret

A

if subtract 1 from value get % increase or decrease in odds caused by being male
odds of owning a gun increase by 145%

89
Q

educ b=-0.056

exp(b) = 0.946

A

year’s education decreases odds by 5.4%

90
Q

10 year age affect on odd

A

Exp(B)^10
1.008^10 = 1.083
odds go up

91
Q

how to format vaiable (code)

A

proc format;

value hospformat 1= ‘Hospitalized’ 2=’Not Hospitalized’;

92
Q

what do formats do

A

start their own folder and can call formats in proc freq

93
Q

how to do chi square test of independence (code)

A
proc freq data=h1n1;
format Hospitalization hospformat. 
		Age ageformat.;
tables Hospitalization * Age / chisq;
weight Count;
run;
94
Q

what do we put after format

A

Any time it’s a format with a dot dot tells SAS it’s a formating statement

95
Q
proc freq data=h1n1;
format Hospitalization hospformat. 
		Age ageformat.;
tables Hospitalization * Age / chisq;
weight Count;
run;
  • means
A

Star says hospitalization versus age.

96
Q

/chisq tells sas

A

specific analysis I want done on frequency table is chi square

97
Q

what does chisq show

A

frequency table

chi square value

98
Q

rule of thumb for doing chi square

A

need to have 5 in each square

will warn if datasets have less than 5 - USE FISHERS instead

99
Q

if chi square value p value

A

reject null hypothesis that age is independent of hospitalization

100
Q

chi square likelihood ratio based on

A

regression analysis

101
Q

mantel-haenszel chi square

A

ordinal test of association

  • Good if have ordinal categories
  • Looking for association b/w rows and columns assuming there is order for the columns
102
Q

Phi coefficient

A

Usually just for 2x2 tables, for which -1

103
Q

Contingency coefficient

A

C=sqrt(ϕ2/(N+ϕ2))

104
Q

Cramer’s V

A

measure of association

105
Q

expected chi square and fishers exact code

A

ods graphics on;
proc freq data=h1n1;
format Hospitalization hospformat. Age ageformat.;
tables Hospitalization * Age / expected chisq;
weight Count;
exact fisher pchi;
run;

106
Q

when to use exact chi square

A

Use exact chi square when we don’t have this assumption covered – expected counts >5

107
Q

Fishers exact

A

don’t have to worry about restrictions chi square has

108
Q

poisson regression

A

Different style of regression based on Poisson distribution

109
Q

Poisson distribution

A

common distribution for counts

110
Q

log odds can’t span

A

0

111
Q

format questions to be answered as yes or no (code)

A

proc format;
value qaformat 1=’Yes’ 2=’No’ 3=’Dunno’;
run;

112
Q

code to check agreement between self questionnaire and interview

A
proc freq data=cough;
	format saq qaformat. int qaformat.;
	tables saq * int /agree;
	weight count;
run;
113
Q
proc freq data=cough;
	format saq qaformat. int qaformat.;
	tables saq * int /agree;
	weight count;
run;

what does two sided test mean

A

testing if equals to zero

114
Q
proc freq data=cough;
	format saq qaformat. int qaformat.;
	tables saq * int /agree;
	weight count;
run;

what does one sided testing means

A

null is kappa 0q

115
Q

which kappa to report

A

ONE SIDED because want positive kappa
don’t look at weighted kappa WANT BASIC KAPPA
dont report exact p value

116
Q

CI for kappa

A

shouldn’t include zero

117
Q

negtaive kappa value means

A

no agreement

118
Q

kappa breakdowns

A

poor (

119
Q

code to produce exact test for kappa

A
proc format;
	value physformat 1=’Minimal' 2=’Moderate' 3=’Large’ 4 =‘Excessive’;
run;
proc freq data=phys;
	format phys1 physformat. phys2 physformat.;
	tables phys1* phys2/agree;
	weight count;
	test agree;
	exact agree;
run;
120
Q

McNemar’s test tests for

A

Symmetry
shown when doing agreement

is the probability that 1 physician rates it a 1 nad naother 3 same as probabilyu as 1st rates it a 3 and another rates it a 1
If table is not symmetric is what it indicates is that one phsyician tends to say that the ectopy’s are lartger than another physician – bias
27 minimal by 1 physician and only 15 minimal for another – bias towrard sayign thigns are smlaler (want 15 and 27 to be closer to each other)

121
Q

H0 and Ha of McNemar’s

A

null is symmetric, alternative is assymetric

122
Q

How to interpret McNemar’s

A

if p value

123
Q

Parametric test: paired t-test

Nonparametric

A

Wilcoxon signed rank

124
Q

How to get nonparametric correlation along with parametric (code)

A

proc corr data=oc pearson spearman;
var before after;
run;

125
Q

difference between spearman and pearson

A

Pearson only looking at linear association

Spearman works off of ranks. rank data to compare data. looking at non-linear association

126
Q

code for matched pairs t-test

A
ODS GRAPHICS ON;
PROC TTEST DATA=work.contraceptives;
	PAIRED before * after;
		TITLE ‘Example of Matched Pairs’;
RUN;
ODS OFF;
127
Q
ODS GRAPHICS ON;
PROC TTEST DATA=work.contraceptives;
	PAIRED before * after;
		TITLE ‘Example of Matched Pairs’;
RUN;
ODS OFF;

what does this do

A

match observations before and after to determine if same

assume before and after are normally distributed

128
Q

paired t-test H0 and Ha

A

H0: before = after
H1: before != after

129
Q

what if reject null of paired t-test

A

say observations before not distributed same way as observations after

130
Q

Wilcoxon signed rank used when

A

nonparametric for matched paires t-test

131
Q

wilcoxon signed rank code

A

PROC UNIVARIATE DATA=oc;
VAR diff;
RUN;

look at Signed rank

132
Q

when to do signed rank vs t-test

A

based on normality. often people do non-parametric test so don’t have to assume normality

nonparametric tests harder to prove something, so significant in nonparametric will be significant in parametric

133
Q

independent samples
parametric: t-test
non-parmetric

A

Mann Whitney U (Wilcoxon Signed Rank)

134
Q

t-test independent samples code

A

PROC TTEST DATA=pain;
CLASS Physiotherapy;
VAR pri;
RUN;

135
Q

PROC TTEST DATA=pain;
CLASS Physiotherapy;
VAR pri;
RUN;

class statement tells SAS

A

where to get 2 independent samples
physic classified in 2 groups

trying to see if pain rating same in 2 groups

136
Q

Mann Whitney U code

A

nonparam test for indep samples

PROC NPAR1WAY DATA=pain WILCOXON;
CLASS Physiotherapy;
VAR pri;
RUN;

137
Q

What to look at when doing Wilcoxon Rank sum (Mann Whitney U)

A

T-approximation

doing 2-sided test to determine if differences are zero

138
Q

Kruskal Wallis Test code

A

PROC NPAR1WAY DATA=pain WILCOXON;
CLASS analg;
VAR pri;
RUN;