Quiz 2 Flashcards
Y increases as X increases by?
Slope B
Simple linear regression model in words
response = predictor + error
What is a signal
Predictor
What is noise
Error
Formal statistical model:
response = intercept(p) + slope(p) + error “Where p = predictor variable)
Descripe linear model when pages is the response variable and words is the predictor
pages = words + error or pages = wordsp + wordsp1 + error
Multiple linear regression model
response = predictor 1 + predictor 2 + error
Simple linear regression is:
Linear regression with one continuous response variable Y and ONE continuous predictor variable X
Multiple linear regression is:
Linear regression with one continuous response variable Y, and MORE THAN ONE continuous predictor.
What are the basic assumptions of linear regression
Linear, normally distributed residual with homogeneous variances
How does B1 quantify different things between simple and multiple regression
The effects of X1 on Y controlls for effect of X2. Isolates the influence of x1 independent of x2 by estimating b1 holding x2 constant.
Does not allow X2 to interfere when assessing the effect of X1
Explain what B1 is in multiple regression model
For every additional X1 (predictor), the number of Y (response) increases by b1, holding the number of X2 constant.
Main difference between b1 in linear regression and multiple regression
b1 in linear is regression slope while in regression b1 and b2 are partial regression slopes
What is the 2nd complication in multiple linear regression?
Multiple predictors can interact in their effect on the response variable.
What is the regression model for interaction? Multiplicative model
response = b1 + b2 + (b3xb2) + error
What is the third complication in multiple regression models?
Predictor variables can themselves be corelated
What are the assumptions of multiple regression models?
- Linear relationship between predictor and response variable
- Equal variance of residuals around regression line
- normally distributed residual
- Predictors should not be strongly correlated (ie. no collinearity)
How do you detect collinearity?
- Think about which predictor variables are likely to be collinear before building model
- Plot predictor variables against each other
- Calculate the TOLERANCE associated with each predictor.
Tolerance
Lower tolerance is bad.
Tolerance < 0.1 is really bad
VIR
Variance inflation factor
VIF = 1/tolerance
Higher VIF is bad
VIF >10 is really bad.
Method 1 of writing multiple linear regression
Method 2 of writing multiple linear regression
Types of linear models
Simple
Y = B0 + b1x1 + error
Multiple linear
Y = B0 + B1X1 + B2X2 + error
(More than one continuous predictor variable)
Anova model
Y = B0 + B1X1a + B2X1b + error
(One or more categorical predictor variables that have more than one level [eg. a and b])
Ancova model
Y = B0 +B1X1a + B2X1b +B3X2 + error
(One or more categorical predictor variables that have more than one level AND one or more continuous predictor variables)
Linear statistical model with one categorical predictor variable
Yij = u +B1xaij +B2xbij + B3xcij + errorij
Where:
j represents a single observation from single organisms and i represents the level of the predictor
u = Mean of all observations across all levels of all factors
B1 = difference between the mean of ‘a’ (level) and ‘u’ (mean)
B2 = difference between the mean of ‘b’ (level) and ‘u’ (mean)
B2 = difference between the mean of ‘c’ (level) and ‘u’ (mean)
What is the sum of all the squared deviation in analysis of variance (SS)(ANOVA)
Eij(Yij - Yj)2
Where Yij = One observation I within group j
Yj = mean of all observations in group j
DFresidual = ?
n-k
where n = total number of observations (true replicates)
k = number of levels within the predictor variable (number of groups or factor levels)
MS =?
MS = SS / df
Average deviation of the data from the group means
Write an anova table
Source of var | SS | df | MS | F-value
groups | SS | df | SS/df | MSgroups / MSresidual (signal/noise)
residuals | SS | df | SS/df |
total | SS | df |
What is an observational study
Cannot isolate casual drivers from effect of potentially confounding variables.
(confounder is a variable that influences both the dependent variable and independent variable)
What is an experimental study
Can potentially isolate casual drivers from the effect of confounding variables.
Lurking variables examples
Lurking variables can make experiments useless and their influence can only be neutralized with good experimental design.
Z
X /–>\ Y
Z can influence Y through X
Z Can directly influence Y
The key to strength of experiments is that they allow us to explicitly ISOLATE the effect of X on Y without the lurking variable Z interfering with the experiment.
Ways to neutralize lurking variables
- Replication
- Randomized design
- Blocking
What are the benefits of replication?
Any casual relationship between variables may be caused by lurking variables we are unaware of.
This is also called the sampling effect and small samples are more vulnerable thus we increase replication to minimize interaction from lurking variables.
Essential to notice we must replicate the correct thing and avoid pseudoreplication as it increases the F-ratio, which decreases the p-value, which increases the chance we incorrectly reject the null hypothesis.
What are the benefits of randomization?
Reducing bias: Randomization helps to reduce the impact of selection bias and confounding variables, which can affect the validity and generalizability of study results.
Improving statistical power: Randomization helps to increase the statistical power of a study, which refers to the ability of a study to detect a true effect if it exists.
What are the benefits of blocking?