Econometrics Flashcards
Explain the effect heteroskedasticity has on regression results?
- OLS assumption is homoskedasticity; Var(u|x) = variance
- Heteroskedasticity does not affect the estimate but its standard error
- OLS estimator is still linear and unbiased
- However, it will now be inefficient and the estimates of the variances will be biased
- This will affect inference, i.e. the F-tests and t-statistics can no longer be trusted
What are the two ways to check whether heteroskedasticity is an issue in the regression?
- Run a regression with and without robust standard errors. Compare the standard errors and discuss the extent of heteroskedasticity
- Run a heteroskedasticity test
What are the 4 steps of the Breusch-Pagan test for heteroskedasticity?
- Estimate the equation y = B0 + B1x1 + … + u via OLS and obtain the residuals uhat
- Compute the squared residuals uhat^2.
- Regress uhat^2 on all of the explanatory variables and compute the F-test of joint significance of the explanatory variables.
- Reject the null hypothesis (H0: Var(u^2|x) = variance of no relationship between the squared residuals and the explanatory variables if the F-statistic is sufficiently large. If you reject the null hypothesis, there is evidence of heteroskedasticity in the regression.
What are 4 things that cause a regression to be biased?!!!
- Omitted variable bias
- Simultaneity
- Measurement error
- Sample selection bias
What are the 5 Gauss-Markov assumptions?
- Linear in parameters
- Random sampling
- No perfect collinearity (no correlation of 1 between two of the variables)
- Zero conditional mean: E(u|x1, … xk) = 0
- Homoskedasticity
What are the two conditions for a valid instrumental variable?
- Relevance: corr(z,x) not equal to 0
- Exogeneity: corr(z,u) = 0
What are the steps in 2SLS?
- First Stage:
- Regress the endogenous variable on the instrument and all the other relevant controls to obtain fitted values Xhat - Second Stage
- Run the original regression using Xhat instead of X (the variable with the fitted values instead of the original variable)
Why should 2SLS not be done manually?
The standard errors will be incorrect as you have to take into account the fact that the predicted value of Y used in the second step has been estimated and is thus subject to sampling error.
What 2 things should be done after the first stage in 2SLS?
- The first stage regression output should be used to judge how good the proposed instrument is
- You should check whether the coefficient on the IV is statistically significant
What bad thing does 2SLS cause?
2SLS causes standard errors to be larger than in the original OLS estimator
Explain the relevance condition for a valid instrument?
The instrument needs to be relevant; this means that it ought to be correlated with the instrumented variable. Students have to point out that this can be tested
Explain the exclusion restriction/exogeneity condition for a valid instrument?
The instrument shouldn’t be correlated with any unobserved characateristics which may affect both the instrumented variable (X) and the dependent variable (Y) and thus be hidden in the error term of the regression. In other words, Z should impact Y solely through its correlation with X
If the Instrumental Variable regression produces a coefficient which is not a large difference in comparison to the original coefficient, what 2 things could this mean?
- The endogeneity issue is not substantial
- The instrument is bad
Why does 2SLS cause standard errors on the coefficient to be higher?
- IV is less efficient
- Standard error increases as a result of first-stage estimate being used in the second-stage regression
- Size of the change depends on the strength of the correlation between the endogenous variable and the instrument
- The more relevant the instrument, the smaller the increase in standard error
How do we test for the existence of any differences in the determination of something between two variables e.g. male and female?
- We would include a female dummy variable and all the interaction terms of the dummy with the remaining control variables included in the original regression.
- This would look like: BMI = B1 + B2income + B3age + B4age^2 + B5 female + B6education + B7smoke + B8 income x female + B9 age x female + … + B12smoke x female + u
- We would then test for any differences across genders by undertaking an F-test with 6 restrictions (H0: B5=B8=B9=B10=B11=B12=0; H1 at least one of these conditions does not hold)
- If the F-statistic is greater than the relevant critical value then one can reject the null hypothesis and conclude that there are statistically significant differences across genders
What can the inclusion of an interaction term lead to?
- Since the variables and the interaction terms are by construction correlated with each other, there may be higher standard errors
- Therefore, the coefficients may be imprecisely estimated and individually statistically insignificant
- They may, however, be jointly statistically significant
What is the algebra to explain the Linear Probability Model?
- E(Y|X) = 1 x P(Y=1|X) + 0 x P(Y=0|X) = P(Y=1|X)
- Since OLS assumes that: E(u|X) = 0 :
- E(Y|X) = E(@ + B1X + u|X) = @ + B1X
- So P(Y=1|X) = @ + B1X
What are the 2 advantages of the Linear Probability Model?
- Simple to estimate and interpret
- Inference is the same as for multiple regression
What are the 3 disadvantages of the Linear Probability Model?
- Constant Estimated Partial Effects: assumes constant change in P(Y) given a unit change in X for all values of X. This is a strong assumption when Y can only take two values (0 and 1)
- In some instances it is possible for the LPM to predict probabilities which are smaller than 0 or greater than 1
- Heteroskedasticity will be present since Y is a binary variable and so has a Bernoulli distribution
What are the 2 alternative estimators instead of LPM, which are better suited for a regression with a binary dependent variable?
Logit and Probit
In regards to measurement error, how can we show that the OLS regression is unbiased?
If the covariance between the mismeasured (observed) explanatory variable x and the measurement error e is equal to 0, we can show that OLS is still unbiased
What is the algebra describing measurement error?
- We observe x not x* so the measurement error is: e = x - x*
- This implies x* = x - e and so: y = B0 + B1(x - e) + u
- Which is equal to: y = B0 + B1x + (u - B1e)
- The classical errors-in-variables (EIV) assumptions are: Cov(x*,e)=0 and Cov(u,e)=0
- For OLS to be unbiased and consistent, we would need: Cov(x, u - B1e) = 0
- However, Cov(x, u - B1e) = Cov(x,u) - B1Cov(x, e)
- Cov(x, u) = Cov(x* + e, u) = Cov(x*, u) + Cov(e, u) = 0 + 0 = 0
- Cov(x, e) = Cov(x* + e, e) = Cov(x*, e) + Var(e) = 0 + Var(e)
- Therefore, Cov(x, u - B1e) = 0 - B1Var(e), which is not equal to 0
- So the OLS is biased and inconsistent
For non-independent variables, how can Var(X+Y) be expressed?
Var(X) + Var(Y) + 2Cov(X,Y)
- often in the question it will say Cov(X,Y) or Cov(ui, uj) = 0