Linear Model Assumptions Flashcards
What values must the leverage of each observation in a linear model fall between?
Leverage must greater/equal to (1/n) but less than/equal to 1, where n=number of observations.
What value must the sum of leverages be?
The sum of all leverages in a set of observations must add up to p+1, where p=number of explanatory variables plus the intercept (1).
What is the Variation Inflation Factor (VIF) value if the corresponding explanatory variable is uncorrelated with all other explanatory variables?
VIF of that explanatory variable would be equal to 1.
What is tolerance? And how is this value calculated?
Tolerance falls between 0 and 1. Low tolerance suggests high multicollinearity. High tolerance suggests low multicollinearity.
According to Frees rule of thumb, what is the cutoff to determine whether an explanatory variable is highly correlated with others?
High multicollinearity is suggested if the VIF of an explanatory variable is greater than or equal to 10.
What are three drawbacks to the linear model when the response variable is binary?
- Potential mismatch between the range of coefficients and the range of the mean in a Bernoulli dist.
- Heteroscedasticity due to mean and variance being related in a Bernoulii violates the homoscedasticity assumption under the LM.
- Meaningless residual analysis b/c Bernoulii produces only discrete values, when LM is continuous.
What is an outlier?
Outliers are observations for which the response y is unusual given the predictor x (or the predicted outcome given x).
What is the difference between an outlier vs an observation with high leverage? Which has a bigger impact?
An observation with high leverage means it has an unusual value for x - as opposed to an outlier, which is an unusual value for y. Removing an observation with high leverage is more impactful than removing an outlier.
How is it more difficult to detect high leverage in a MLR setting?
It is possible to have an observation that is well within the range of each individual predictor’s values, but that is unusual in terms of the full set of predictors.
Is removing an outlier impactful in regression results?
While removing an outlier may not have much impact in the final regression line (as a highly leveraged observation has more impact) it can impact MSE enough to impact confidence intervals and p-intervals.
What is a commonality between a studentized residual vs. a standardized residual? What is the difference?
Both residuals are the same in that they both serve as unitless measures by being divided by their estimated standard error. The difference lies in one term, the MSE term, in that the studentized version uses MSE in exclusion of the i-th term, whereas the standardized version uses MSE with all included. This also means that the studentized version uses the t-distribution, but the standardized version does not.
What is the purpose of Cook’s Distance?
It combines leverage and residuals into a single measure.