Lecture 5 Flashcards

Question 1

Q

Formula for the total prediction error
- what does the prediction interval account for?

Answer

A

0^0 - y = {0^ - 0} + u
- prediction interval accounts for the uncertainty in u in addition to the estimation error in 0^
- we know that the estimation error is normally distributed and if MLR.6 applies, then u is also normally distributed.

0^-y = (0^ - 0) + u, N(0, std(0^)^2 + o^2)

So the prediction intervals use the normal distribution above to create the intervals

Question 2

Q

If all regression models are wrong, two natural questions arise?

Answer

A

Why bother running regressions in the first place place, as if the model is wrong, most conclusions drawn from it will also be wrong
Why have we spent 5 weeks analysing OLS? Estimators appear to be useless anyway

Question 3

Q

Answers to the natural questions

Answer

A

if we carefully build a model then we can reduce/ minimise the degree of misspecification
in which case our estimators may still be useful
they might be biased, due to OVB, but if bias is small - still informative
all inference tools still apply even if biases are present

Question 4

Q

Whats the point of quadratic terms?

Answer

A

Can account for diminishing marginal utility, worst case can hypothesis test to see if it is even significant

Question 5

Q

Whats the issue with adding outcome variables as controls

Answer

A

This can lead to biased estimates due to simultaneity or endogeneity issues
- simultaneity occurs when both the independent and control variables are influenced by the same unobserved factors, or jointly determined

Question 6

Q

What happens if we change our units of measurement?

Answer

A

No issue, if we rescale any of our variables, then the coefficients have to change to keep the interpretation unchanged

Nothing happens to the R squared, t statistics or F statistics

Question 7

Q

How can coefficients in log models be interpreted?
- for log-log
- for log-linear

Answer

A

As elasticities or approximate % changes
- for log-log models, coefficients represent elasticities
- for log-linear models, coefficients represent the % change for a one unit change in x

Question 8

Q

Reasons for using the Natural Log:

Closer adherence to OLS assumptions:

Answer

A

linearity
homoskedasticity - models with logy tend to reduce heteroskedasticity
normality, logs often mean residuals are more normally distributed
log transformations reduce the range of a variable, especially if it spans several orders of magnitude - like income.

Question 9

Q

Limitations of using Natural Log

Answer

A

if y>/0, but y = 0 is possible, we can not use logy
its harder to predict y when we have estimated a model for log(y)
in cases where y is a fraction and close to 0 for many observations, logy can have more variability than y.

Question 10

Q

Log vs quadratic models for capturing diminishing effects

Answer

A

Log models capture diminishing effects, but assume the effect diminishes continuously and monotonically

Quadratic models allow for increasing and decreasing effects, depending on the value of x, quadratic models can also identify turning points.

Question 11

Q

Why model with interaction terms?

Answer

A

Sometimes it is natural to think the partial effect of one variable, like education, could depend on the level of another variable, like intelligence.

so now when we take the first derivative, it depends on a different variable too.

Question 12

Q

Population R^2

Answer

A

R^2 = 1 - (vary)/(varu)

Aka the population variation in y explained by x1,…,xk

Question 13

Q

Adjusted R^2

Answer

A

R^2 = 1 - SSR/SST = 1 - (SSR/n)/(SST/n)
- consistent but not unbiased estimators
- to adjust finite- sample biases, we can adjust to:

SSR/(n-k-1) and SST(n-1)

Plugging this in, we get the adjusted r^2

Question 14

Q

What does SST measure?

Answer

A

The total variation in the dependent variable - and is fixed for a given dataset, depending on Yobs and the mean of y

Question 15

Q

What doe SSR measure?

Answer

A

Measures the unexplained variation after fitting the regression model, adding regressions increases complexity, so denominator now adjusts for this in the adjusted version

Question 16

Q

Key difference between adjusted and non-adjusted R^2

Answer

Study These Flashcards

A

Key thing here is for the normal r^2, adding regressors will mean it will always increase.

Now, as you add regressors, SSR falls, but so does df = n - k - 1, so now the effect will depend on whether the new variable actually improves model fit.

Question 17

Q

What is a predicted value and therefore what’s a prediction interval?

Answer

Study These Flashcards

A

Predicted value of the dependent variable y^ is:
- y^ = E[Y|X]
Gives a range of values within which we expect a future observation to fall, with a certain level of confidence, so if you estimate a student with 10 hours study to score 75 marks:
- CI might be [73,77], we’re 95% confident the average score of all students who study 10 hours is between 73 and 77
- PI might be [65,85], we’re 95% confident that one particular student will score in this range

Question 18

Q

Why is the PI wider?

Answer

Study These Flashcards

A

Includes:
1. Uncertainty about the estimated mean
2. Extra variation from individual differences

Lecture 5 Flashcards

(18 cards)