Proxy Variables And Measurement Error Flashcards
Why are proxy variables used?
They are used in place of explanatory variables we cannot observe and/or don’t have sufficient data on
What would we expect from proxy variable?
Would expect proxy variable to be highly correlated with the omitted variable
Assumptions necessary for proxy variable method?
- u uncorrelated with x1, x2 and x3*.
- corr(x3,u)=0
- corr(x1,v3)=corr(x2,v3)=0
What does it mean if corr(x3,u) is not =0
If the original error and proxy variable are correlated it would mean that the proxy variable should be included in the original regression in its own right
- when corr(x3,u)=0 means if x3* was available it should not be included in the regression
- I.e x3 should not have a separate influence on y
What does it mean if the error term in the x3* regression (v) is correlated with x1,x2?
If v is correlated with x1 and x2 they would have to be included in the regression of x3* for the omitted variable
- Note: v also has to be uncorrelated with x3 to satisfy ZCM
- if v is correlated with x1 and x2 it will lead to biased estimators
- hopefully the bias is smaller than OBV from not including proxy
If all additional assumptions hold what does this mean for regression model?
- combining the two regressions (original + x3* regression)
- in the new regression model the error term is uncorrelated with all explanatory variables
- coefficients will be correctly estimated using OLS
- coefficient will correctly identified for x1 and x2
What else can be used as proxy variables?
- omitted observed factors may be proxied by the values of the dependent variables from an earlier time period
- I.e including crime rate from previous period as an explanatory variable in regression for crime in present period
What does measurement error in dep. variable equal?
e0 = y - y*
I.e measurement error = mismeasured (observed) - actual value
Note: we don’t know actual value only see observed/mismeasured value and assume measurement error
What happens to regression when measurement error is included?
y=y*+e0
y=b0+b1x1+b2x2+…….+bkxk+ (u + e0)
Along with MLR1-4 what else must hold in order for estimated regression with measurement error to give unbiased and consistent estimators?
- cov(e0,Xj) = 0 - measurement error must be uncorrelated with all explanatory variables
- cov(e0,u)=0-measurement error uncorrelated with error term
- E(e0)=0 - expected value of measurement error = 0
What happens to overall variance in estimated regression with measurement error?
-overall variance is increased
What does mismeasured explanatory variable equal?
x1=x1*+e1
- mismeasured explanatory variable = actual value + measurement error
What happens to regression equation when measurement error for an explanatory variable is included?
y=b0+b1x1+…..+bkxk+u
x1=x1-e1
y=b0+b1x1+…..+bkxk + (u-b1e1)
What must be assumed in order for estimators to be unbiased and consistent when measurement error for explanatory is included in the regression?
- MLR 1-4 must hold
- E(e1)=0
- E(u)=0
- cov(x1,u)=0
- cov(x1*,u)=0
- cov(x1,e1)=0
What is the Classical Errors-in-Variables (CEV) Assumption?
Cov(e1,x1*)=0