Econometrics I Revision Flashcards
Explain and derive the OLS estimator
Minimise the sum of squared deviations between the actual and the sample values. You choose β to minimise this between y and its linear approximation given by its conditional expectations.
(y-Xβ)’(y-Xβ)
dy/dx
-2X’y + 2X’Xβhat = 0
βhat = (X’X)-1 X’y
d2y/d2x
2X’X is positive definite
Assumptions:
Explain and derive MM estimator
MM estimator finds values for β that ensures the sample counterpart of the population moment condition E(X’e)=0 is satisfied:
E(X’e)=1/nX’(Y-Xβ) = 1/n(X’y-X’Xβ) = 0
βhat = (X’X)-1 X’y
What does (X’X)-1 X’X equal
I - the identity matrix
Explain the law of iterated expectations
E[E[X1|X2]] = E[X1]
Explain what random sampling means
The population model has been specified and an independent, identically distributed (iid) sample can be drawn
Explain what an unbiased estimator means
E(βhat) = β
Explain zero conditional mean assumption
Population orthogonality condition E(e|X)=0
Explain when X’X is nonsingular
If non-singular, the linear projection of y on Xn always exists and is unique
Show the equation when an estimator is consistent
plim(βhat) = β
What are the assumptions of the linear regression model?
- Population orthogonality condition: E(e|x)=0
- Full rank: x is an nxk+1 matrix
- Linearity: the true model is y=Xβ+e
- Spherical disturbances: homoscedasticity and non-autocorrelation E(ee’|X) = σ^2 I
Explain the BLUE properties
Best: βhat is more efficient than any other unbiased estimator. V(β~) - v(βhat) is a positive definite matrix
Linear: βhat is a linear function
Unbiased: E(βhat) = β
Explain multicollinearity
It implies the columns of X are linearly independent, X’X will not be invertible, OLS parameters are not identified
Explain asymptotic inference
If the small sample distribution of an estimator is unknown we can use an asymptotic approximation
Explain the Central Limit Theorem
If we have an infinite sequence of iid random variables, no matter what their distribution, in the limit they are normally distributed
Show that βhat is an unbiased estimator of β
βhat = (X’X)-1X’y
E(βhat) = E(X’X)-1X(Xβ+e)
Expand…
Identity matrix!
E(βhat) = E((X’X)-1X’e)
Because of Law of iterated expectations:
E(E(X1|X2)) = E(X1)
E(βhat) = (X’X)-1X’E(E|X)=0
E(βhat) = E(β) = β
Show that βhat is a consistent estimator of β
An estimator βhat is consistent if and only if plimβhat = β
βhat = (X’X)-1X’y = β+(X’X)-1X’e = β+(X’X/n)-1(X’e/n)
plimβhat = plim(…)= plim(β) + plim(…) = plim(β) + plim(X’X/n)-1)plim(X’e/n)
Using law of large numbers:
lim(X’X/n)=Q=E(X’X)
lim(X’E/n)=0=E(X’e)
plim(βhat)=β + (Q-1 x 0) = β
Explain the assumptions of unbiasedness and consistency of a linear estimator
- Linearity in parameters
- Full rank of X: X is an nxk matrix
- Random sampling
- Expected value of errors conditional on X is zero: E(e|X) = 0
What are the consequences for OLS estimation if the actual variance covariance matrix is σ2Ω instead of σ2I?
The new variance is: (see booklet)
Whilst βhat remains unbiased and consistent, it will no longer be best or asymptotically efficient. This implies there is an alternative estimator with minimum variance.
Although it is unbiased, its variance will be based on the wrong expression i.e. σ^2(X’X)-1. This implies any attempts to use standard t, F, or Wald tests will lead to inaccurate results
Provide an illustration of what the matrix Ω would look like if the errors are heteroscedastic. How would you proceed?
- see diagram in booklet.
How to correct for it:
- Correct standard errors. Use White’s heteroscedastic robust standard errors (explain?)
- If you know the precise form of heteroskedasticity, you can correct the estimator for it using GLS/WLS:
- In OLS, least squares, all observations weighted equally. Easiest example is heteroskedasticity. When we use GLS, we assume we know the form of heteroskedasticity (h_i). Divide our equation by sqrt(h_i), we get a homoskedastic equation. This gets us a model with homoskedastic errors. Take LS of the second equation. This is now weighted.
What are the implications of statistical inference if the error terms are correlated within clusters in data? Give an empirical example. How would you proceed?
Error terms are no longer independent. OLS is no longer efficient and the standard error is downward biased.
e.g.) test scores and parental income. If there are school/teacher effects or teacher effects, there will be some correlation within a school or a class.
You can proceed by clustering standard errors at the level you are concerned about correlation (school/class level).
You can measure the size of the bias via the Moulton factor:
- See booklet for equation: How much larger is the clustered SE compared with the normal one?
As ρu and number of observations per group rises, so does the magnitude of the bias.
Assume z, a nxm matrix of valid instruments can be chosen for x. What conditions must the instruments in z satisfy?
- Exclusion restriction. z is uncorrelated with the error term. E(z1’e)=0
- z has to be partially correlated with the endogenous regressor, X.
- see booklet for equation
Derive a consistent estimator for β using 2SLS
If we have m instrumental variables for X such that E(Zh’e)=0. Assuming Zk is partially correlated with X, we have a countless number of IV estimators Z1…Zm.
Of all all possible linear combinations, 2SLS chooses the one most correlated with X. The linear projection of X on Z:
X = δ0 + δ1X1 + δ2X2 + θ1Z + θ1Z1 + … + θmZm + rk
Since rk has zero mean and is uncorrelated with all other exogenous variables:
X = δ0 + δ1X1 + δ2X2 + θ1Z + θ1Z1 + … + θmZm
We can consistently estimate the reduced form making the standard OLS assumption:
Xhat = δhat0 + δhat1X1 + δhat2X2 + θhat1Z + θhat1Z1 + … + θhatmZm
The IV estimator can be defined as:
E(Xhat’e) = 1/n Xhat’ehat = 1/n Xhat’(y-xβhat) = 0
βhat = (Xhat’X)-1Xhat’y
Derive a consistent estimator for β using Generalised Method of Moments
We have m population moment conditions
E(Z’e) = (…) = 0
With sample analogues:
- See booklet for equation
Solve for β by minimizing the quadratic form:
Q(β) = (…)
W is given by the asymptotic covariance matrix of the moment condition
Since ei are i.i.d., sigma2 will weight each moment the same.
Sub into quadratic form:
- See booklet
Give an example of an application where you need to develop a consistent β estimator when there is an instrumental variable. Explain the identification problem and propose two potential instruments that could be used
The effect of education on earnings, affected by underlying ability. You cannot measure this.
As a result you will have biased estimates that are inconsistent because of OVB.
You can solve this with an instrumental variable that affects education but not outcome:
- Time spent in compulsory school years. This is not based on ability
- Distance from school. Someone close to school may undergo more education.
Explain what is meant by the weak instruments problem. What are the consequences for instrumental variables estimation
There is not that strong a relationship between Z and X. As a result, the result of 2SLS estimation is bias - towards the OLS estimates.
If there isn’t much of a relationship between X and Z, this is more similar to OLS. The size of the bias:
Use an F-test. A low F implies a high level of bias. Adding useless instruments will increase bias.
Outline how you would estimate panel data if you believe X effects are random and independent. How do you know if parameters are identified
RE estimator. Assumptions of common trends. It assumes ci are random draws from the population like Xs. We also assume Cov(xik,ci)=0.
Both FE and RE assume strict exogeneity of X|c.
Under RE, OLS no longer the most efficient estimator but still biased. Use feasible GLS to estimate the RE estimator, this gives us the most efficient unbiased estimator.
Feasible GLS:
In OLS, least squares, all observations weighted equally. Easiest example is heteroskedasticity. When we use GLS, we assume we know the form of heteroskedasticity (h_i). Divide our equation by sqrt(h_i), we get a homoskedastic equation. This gets us a model with homoskedastic errors. Take LS of the second equation. This is now weighted.
We need to estimate h_i, so use hat. FGLS.
Outline how you would estimate panel data if you believe the effects are correlated with the error term?
Use a FE estimator. This assumes allowed correlation between ci and Xi (eq.), and that ci are fixed parameters to be identified.
Both assume strict exogeneity of X|c.
The FE estimator estimates using dummy variables for each cross sectional unit using OLS
Outline a test that would allow you to choose between random and fixed effects approaches?
Hausman test. Null hypothesis is that RE is best. No correlation. If low p value, significant - RE is invalid, use FE.