SU5 - Further Issues In Linear Regression: Modelling And Inference Flashcards
What is R squared?
Statistic for evaluating if the model fits the data well
4 Properties of R squared?
1) R squared cannot be negative.
2) R squared is bounded between 0 and 1
3) R squared = 0 if SSE = 0
4) The π
2 is non decreasing as you add more explanatory variables into the model
Why R squared cannot be negative?
because SSE and SST, sums of squared terms cannot be negative
Why R squared = 1 if SSR = 0?
SST = SSE
Variation in Y is completed accounted by variation in Y(hat). In this case, Y(hat) fits the data perfectly.
Why R squared = 0 if SSE = 0?
R squared = 0 if Y(hat) has no variations. If Y(hat) has no variations, it does not explain Y at all.
Does R square decrease?
Never decreases, usually increases when another independent variable is added to a regression. Thus it is a poor tool for deciding whether one variable or several variables should be added to a model
What is a regression through the origin?
A regression without an intercept term
ln(1+π‘ππ₯)=π1ln(1+ππππππ)+π where β1β is added to tax and income to prevent the log of zero.
Why is regression through the origin a bad idea?
1) If intercept is really zero, no harm adding in the intercept. If intercept is not zero, then estimates of both intercept and slope coefficient will be wrong
2) With regression through the origin, it is possible for R squared to be negative even though R squared should be between 0 and 1
What happens if we include an irrelevant variable?
The unbiasedness of the regression will not be affected but the variance could be affected
What happens if we omit a relevant variable?
Generally causes the OLS estimators to be biased (omitted variable bias) or worse, inconsistent
How to deal with multicollinearity?
1) Increase sample size
2) drop some variables that are highly collinear
What does multicollinearity affect?
Will not affect the variances of all OLS slope estimators. Only those from highly correlated regressors are affected
What happens if Xj is highly correlated with one or more regressors in the model?
R2 will be very high and Var will be large, causing ππ to be imprecise
What is the error normality assumption?
The population error e is independent of the explanatory variables and is normally distributed with zero mean and variance.
What is the linear model called when it is under six assumptions?
The classical linear model (CLM)
Assumptions of linear regression 1-5 + error normality