Lecture 4 (Regression fundamentals) Flashcards
What is the conditional expectation function?
Y_i = E(Y_i|X_i) + e_i
With conditional mean zero, $E(e_i|X_i) = 0$, thus
Y_i = E(Y_i|X_i)
Derive beta in the bivariate case: yi = b0+b1xi+ei
See www or something
What does beta_1 measure in the multivariate regression?
In the multivariate regression, $\beta_1$ measures the partial effect of $x_1$ on Y. That is, controlling for the influence of all other $x_k$.
What is a saturated model and what problems might it create?
A saturated model is basically a model which includes variables for all values of the covariates and their interactions (fully parameterized). The reason for running a saturated model is to not depend on the functional form assumption. That is, you get a non-parametric regression.
However, a saturated regression runs into the “curse of dimensionality”, which is when you have many covariates.
Note that it is NOT possible to have a saturated model when we have continuous independent variables since it is not possible to have representations of all outcomes of the independent variables.
Johan claims that saturation is a binary phenomenon, either you have a fully saturated model (saturated) or a non-saturated model.
In matrix notation, show the OVB and discuss what we can say about the direction of the bias.
See PS.1 Exercise 2.1
See PS.1 Exercise 2.1
What problem do we get with measurement errors in the dependent variable?
Measurement error in $Y$ is not a problem if it is uncorrelated with the explanatory variable $X$. It however still leads to larger standard errors and less precision. Measurement error in an explanatory variable is more problematic.
What problem do we get with measurement errors in the independent variable? So formally (do not derive) what our beta estimate is at the limit.
\plim\hat \beta_1 = \beta_1 \frac{var(x^*{1})}{var(x^*{1})+var(v_{})}
The regression coefficient is biased towards zero by the “attenuation factor”, λ, also known as the reliability ratio. Specifically, the term $\sigma^2v/\sigma^2{x_1^}$ is the so-called “signal-to-noise ratio”. Since variance is ALWAYS positive, β1 will be divided with a positive number > 1. That is, β1 will always be biased downwards. The attenuation bias is increasing in $σ^2v$ and decreasing in $σ^2{x_1^}$. When can then note that it is possible to decrease this bias by having a sample with a large variance.
The attenuation bias is likely to be present in almost all studies. The good thing is that it makes the estimates more conservative and keeps the sign of $β_1$. One way to correct this error is to use IV. That is, we use an instrument Z that is correlated with the variable with the measurement error. Ashenfelter for example used estimates on individuals’ years of schooling from their siblings as an instrument for years of schooling in his Mincer regression.
Derive the attenuation bias formula.
See PS.2 Exercise 1. Without OVB.
See PS.2 Exercise 1.
What happens with the measurement error in a multivariat setting?
The main takeaway is that additional controls in the regression exacerbate the attenuation bias since $\lambda’<\lambda$. The additional control reduces the signal since the variation due to $x^*_2$ is partially out, but it does not reduce the noise.
What is out identifying assumption in OLS with controls?
When we have additional explanatory variables (controls/covariates) in our regression, the assumption is “conditional mean independence”.
E(u_i|x_{2i},…,x_{ki})