Lecture 5 Flashcards
Formula for the total prediction error
- what does the prediction interval account for?
0^0 - y = {0^ - 0} + u
- prediction interval accounts for the uncertainty in u in addition to the estimation error in 0^
- we know that the estimation error is normally distributed and if MLR.6 applies, then u is also normally distributed.
0^-y = (0^ - 0) + u, N(0, std(0^)^2 + o^2)
So the prediction intervals use the normal distribution above to create the intervals
If all regression models are wrong, two natural questions arise?
- Why bother running regressions in the first place place, as if the model is wrong, most conclusions drawn from it will also be wrong
- Why have we spent 5 weeks analysing OLS? Estimators appear to be useless anyway
Answers to the natural questions
- if we carefully build a model then we can reduce/ minimise the degree of misspecification
- in which case our estimators may still be useful
- they might be biased, due to OVB, but if bias is small - still informative
- all inference tools still apply even if biases are present
Whats the point of quadratic terms?
Can account for diminishing marginal utility, worst case can hypothesis test to see if it is even significant
Whats the issue with adding outcome variables as controls
This can lead to biased estimates due to simultaneity or endogeneity issues
- simultaneity occurs when both the independent and control variables are influenced by the same unobserved factors, or jointly determined
What happens if we change our units of measurement?
No issue, if we rescale any of our variables, then the coefficients have to change to keep the interpretation unchanged
Nothing happens to the R squared, t statistics or F statistics
How can coefficients in log models be interpreted?
- for log-log
- for log-linear
As elasticities or approximate % changes
- for log-log models, coefficients represent elasticities
- for log-linear models, coefficients represent the % change for a one unit change in x
Reasons for using the Natural Log:
Closer adherence to OLS assumptions:
- linearity
- homoskedasticity - models with logy tend to reduce heteroskedasticity
- normality, logs often mean residuals are more normally distributed
- log transformations reduce the range of a variable, especially if it spans several orders of magnitude - like income.
Limitations of using Natural Log
- if y>/0, but y = 0 is possible, we can not use logy
- its harder to predict y when we have estimated a model for log(y)
- in cases where y is a fraction and close to 0 for many observations, logy can have more variability than y.
Log vs quadratic models for capturing diminishing effects
Log models capture diminishing effects, but assume the effect diminishes continuously and monotonically
Quadratic models allow for increasing and decreasing effects, depending on the value of x, quadratic models can also identify turning points.
Why model with interaction terms?
Sometimes it is natural to think the partial effect of one variable, like education, could depend on the level of another variable, like intelligence.
- so now when we take the first derivative, it depends on a different variable too.
Population R^2
R^2 = 1 - (vary)/(varu)
Aka the population variation in y explained by x1,…,xk
Adjusted R^2
R^2 = 1 - SSR/SST = 1 - (SSR/n)/(SST/n)
- consistent but not unbiased estimators
- to adjust finite- sample biases, we can adjust to:
SSR/(n-k-1) and SST(n-1)
Plugging this in, we get the adjusted r^2
What does SST measure?
The total variation in the dependent variable - and is fixed for a given dataset, depending on Yobs and the mean of y
What doe SSR measure?
Measures the unexplained variation after fitting the regression model, adding regressions increases complexity, so denominator now adjusts for this in the adjusted version
Key difference between adjusted and non-adjusted R^2
Key thing here is for the normal r^2, adding regressors will mean it will always increase.
Now, as you add regressors, SSR falls, but so does df = n - k - 1, so now the effect will depend on whether the new variable actually improves model fit.
What is a predicted value and therefore what’s a prediction interval?
Predicted value of the dependent variable y^ is:
- y^ = E[Y|X]
Gives a range of values within which we expect a future observation to fall, with a certain level of confidence, so if you estimate a student with 10 hours study to score 75 marks:
- CI might be [73,77], we’re 95% confident the average score of all students who study 10 hours is between 73 and 77
- PI might be [65,85], we’re 95% confident that one particular student will score in this range
Why is the PI wider?
Includes:
1. Uncertainty about the estimated mean
2. Extra variation from individual differences