SAS Flashcards
Code to import data
data work.datasetname; input age weight height; label age = 'Patient Age'; cards; 24 130 65 30 150 70; run;
printing data
proc print run;
UNIVARIATE procedure does what
for each variable prints summary statistics
extreme observations, stem and leaf, basic stats (mean/median/mode/deviation/stdev/range/IQR), quartiles, t-test, sign, signed rank
USE TO EXPLORE NEW DATASET
proc UNIVARIATE code
proc univariate data = setName plot; (plot not necess.) var weight; histogram weight; (not necessary) Title1='Age study'; run;
what does the CORR procedure do
shows simple statistics (N of each group, mean, standard deviation, max, label)
Correlations between variables
proc CORR code
proc corr data = work.setName;
var age height;
run;
code to create new dataset
data work.newSet;
set oldSet;
run;
create new dataset and add new variable and fill it (code)
data work.clenedSet; set oldSet; bp = .; IF x = 6 THEN bp = 1; IF x = 1 OR x = 2 OR x = 3 OR x = 4 OR x = 5 THEN bp = 0; run;
what does cross-tabulation with the FREQ procedure do?
shows two tables: one way freq and cross tabulations of two variables
can see who used what treatment
percentages
frequency tables for variables in analysis
code to show frequencies of dataset proc FREQ
proc freq data = setName;
tables age age*weight;
run;
what to add to FREQ procedure to see how many missing values
tables age age*weight/MISSING;
proc TTEST code
proc ttest data = setName; class group; (groups we want to compare) var height; (compare groups on this variable) run;
what does proc TTEST do to missing values
excludes them
proc TTEST equality of variance results
if Folded F >0.05 assume equal variances, else say variances unequal
what test to use when unequal variances for proc TTEST
Satterthwaite
What test to use when equal variances TTEST
Pooled
proc TTEST if P
Reject null and say difference in heigh between groups
proc TTEST F-test null and alternative
Null is equal variances, alternative is unequal variances
How to use Cochran with proc TTEST
proc ttest data = setName COCHRAN;
When to use cochran
produces p-value for unequal variances
if folded f
How to compare our mean height to mean value under the null of 60 (code)
proc ttest data setName H0=60;
var height;
run;
how to check normality of data
proc univariate data = work.setName plot;
var height;
histogram height;
run;
shows box plot, histogram, etc
how to do before and after TTEST
proc ttest data = setName;
paired before*after
run;
null is before-after=0
ANOVA code
proc anova data = work.setName; class food; (7 different groups ate diff food) model height = food; (compare heights of people who ate different food) run;
ANOVA F-statistic calculation
variance between groups/variance between groups
should be near 1 if null correct
F statistic 6.67 suggests difference in means of groups
When to use Tukey
see which anova means are different.
multiple comparison
Tukey code
proc anova data = work.setName;
class food; (7 different groups ate diff food)
model height = food; (compare heights of people who ate different food)
means food/tukey;
run;
how to interpret Turkey output
comparisons significant at 0.05 indicated by **. those groups different
Why use ANOVA over t-test
faster. running many t-tests will increase chances that results are shown by chance
recode to put heights into 4 groups
data work.heightsgrouped;
set work.heights;
group=.; (initiate variable and handle missingness)
if height >= 10 AND height
Write ANOVA code to test effect of age on group
proc anova data = work.setname; class group; model age = group; run;
what kind of comparisons do we do in anova
one continuous variable (age) to one categorical variable (group)
where do we write continuous and categorical in anova
continuous always on let and categorical on right. class will alway be categorical
anova using proc GLM code
proc glm data = work.setname; class group; model age = group run;
same out put but more options for GLM
how to sort data in increasing order (Code)
proc sort data = work.setname;
by group;
run;
how to create box plot for data (code)
proc boxplot data work.setname;
plit age*group;
run;
how to interpret box plot
horizontal line is median
plus is the mean
top and bottom are 1st and 3rd quartiles
center shaded box made of 50% of data
how do we get all the attributes of our dataset and their characteristics (code)
proc contents data = work.dataset;
run;
what does CONTENTS procedure display
alphabetic list of variables and attributes
variable name, type, length, etc
length of variable important if merging datasets
number is the position in the dataset
how to correlate things together
the CORR procedure
the CORR procedure code
proc corr data = dataset;
var carb ener etohn fat sugaraw sugaref;
with lexpectbirth;
run;
what does CORR procedure output
simple statistics of each variable listed
pearson correlation coefficients
top number is correlation and bottom number is p-value
how to create correlation matrix with all the variables (code)
proc corr data = setname;
var age height weight blah blah3 blah2;
run;
why would we do a correlation matrix
to find out which predictors may be informative when predicting response
understand how predictors are related because that affects how they jointly model the response variable
How to make scatterplot (code)
ods graphics on; proc gplot data = work.set; plot age*hegiht; run; ods graphics off;
reg procedure code
proc reg data = work.set;
model lebirth=ener;
run;
how to structure the model statement of REG procedure
first variable is response and right variable is predictor
what is root MSE
provided by REG procedure
estimate of standard deviation of the Y (response variable)
what is R-square
percent of the variation that the model explains
__% of total variation is explained by the model
what does it mean when regression intercept 27
When energy is 0, life expectancy after birth is 27
what does it mean when slope is 0.014
for every unit increase in energy consumption, life expectancy after birth goes up 0.01
what is null hypothesis
proc reg data = work.set;
model lebirth=ener;
run;
energy consumption not related to life expectancy
how to transform data to look at squared life expectancy? (code)
data work nurtirion2; set work.nutrition;
le2= lebirth*lebirth;
run;
proc reg data = work.nutrition2; model le2=ener; run;
Why squaere data
logarithmic trend taken away - more linear now
assume linear relationship so if look at graph want to see linearityy
what if transformed data and root MSE went up and R square didnt improve
not best transformation
when do transformation hope to improve linearity and our explanation of the variation