BIO 300 Lab Quiz 3 Flashcards
correlation
strength of a linear association between 2 numerical variables
correlation uses
correlation coefficient
correlation coefficient
r
[-1,1]
unitless
-r
as one variable increases the other decreases
inferences from correlation
cannot infer causality
regression
implies causality between 2 variables
used to predict value of response variable from explanatory variable
can determine how much of variability is due to relationship w/ explanatory variable
regression statistic
R^2 = SS_regression / SS_total
[0,1]
linear regression assumptions
- relationship between response (Y) and explanatory (X) is linear
- Y values at each value of X are normally distributed
- variance of Y values is same at all values of X
- Y measurements are sampled randomly from the population at each value of X
are there any outliers
yes, can be seen by boxplot
how to do regression with multiple groups (N/S)
“regression with groups”
then add total regression line by right clicking then going to regression fit
do the data need to be transformed
are the data clumped in one corner of the scatterplot
is there greater spread in one section of the scatterplot
are there different orders of magnitude spanned in the variables
how to SLR
stat– regression– regression– fit regression model
options to check for regression
responses- Chl-a continuous predictor- Log P graphs- residuals vs. fits results- everything but Durbin-Watson storage- residuals
SSregression
proportion of variation in response variable accounted for by the regression
SSresidual
proportion of variation unexplained by regression
MS
measure of variance, average of sums of squares: SS/df
F
F-ratio
MSregression/MSresidual
constant
y-intercept
note that this value will appear in the equation
log P constant
the slope
will appear in equation
better predictor of response variable
higher R^2
multiple regression
stat– regression– fit regression model
responses- chl-a
continuous predictors- Log P, Log N
dont forget
to label residual columns check for normality check for equal variance check if assumptions are met test residuals for normality
when R^2 SLR ~ R^2 MR
possibly 2 explanatory variables are correlated
if two explanatory variables are correlated
co-linearity
why did Log N lose its significance in MR
the variation explained by Log N is already accounted for by Log P.. not much variation left that Log N can describe
test for correlation
stat– basic stat– correlation– variables: logP, log N— ok
stepwise multiple regression
looks at all combinations of explanatory variables to retain the ones that explain the most variation
eliminates explanatory variables that do not add any new explanatory power
do you need different predictive equations for the 2 sampling locations
are the intercepts and slopes of the regression equations significantly different?
test for significant differences in y-intercept of regression lines
ANCOVA
ANCOVA
analysis of covariance
ANCOVA assumes
equal slopes (parallel lines)
testing for equal slopes
are the lines parallel? if they are then interactions are not significant
we’ll assume that they are
if y-intercepts are not significantly different (p-value ≥ alpha)
free to use one regression equation for both locations
running an ANCOVA
stat– anova– GLM– fit general linear model
options in ANCOVA
responses-- chl-a factors- location covariates- log P model- location, log P, location*log P results: only ANOVA storage: residuals
ANCOVA output
- if interaction is not significant re-build model without interaction
- determine if effect of sampling location is important- decide if you need 2 equations or not
if sampling location has significant effect
need to 2 equations
how to get separate equations for sampling location
separate data by sampling location and start again from beginning for each location separately