07 Multiple Regression Model Flashcards

1
Q

Omitted variable bias

A

The bias in the OLS estimator that occurs as a result of an omitted factor, or variable, is called omitted variable bias. For omitted variable bias to occur, the omitted variable ”Z” must satisfy two conditions:
• The omitted variable is correlated with the included regressor (i.e. corr(Z,X) ̸= 0)
• The omitted variable is a determinant of the dependent variable (i.e. Z is part of u)

E(B1) -[p]-> B1 + corr(Xi, ui) * std(u)/std(X)

The formula indicates that:
• Omitted variable bias exist even when n is large.
• The larger the correlation between X and the error term the larger the bias.
• The direction of the bias depends on whether X and u are negatively or positively correlated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to overcome omitted variable bias

A
  • Ideal controlled experiment
  • Include the variable in the regression
  • Do cross tabulation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Advantages of the MLRM over the SLRM:

A

Advantages of the MLRM over the SLRM:
• By adding more independent variables (control variables) we can explicitly control for other factors affecting y.
• More likely that the zero conditional mean assumption holds and thus more likely that we are able to infer causality.
• By controlling for more factors, we can explain more of the variation in y, thus better predictions.
• Can incorporate more general functional forms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Assumptions of the MLRM

A

Assumptions of the MLRM

  • Random sampling
  • Large outliers are unlikely
  • Zero conditional mean
  • (There is sampling variation in X) and there are no exact linear relationships among the independent variables (No perfect collinearity).
  • (The model is linear in parameters)

Under these assumptions the OLS estimators are unbiased estimators of the population parameters. In addition there is the homoskedasticity assumption which is necessary for OLS to be BLUE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Important properties of the OLS fitted values and residuals

A

The OLS fitted values and residuals have the same important properties as in the simple linear regression:
• The sample average of the residuals is zero and so avg(Y) = avg(E(Y))
• The sample covariance between each independent variable and the OLS residuals is zero. Consequently, the sample covariance between the OLS fitted values and the OLS residuals is zero.
• The point (X ̄ , X ̄ , …, X ̄ , Y ̄ ) is always on the OLS regression line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Consequences of heteroskedaticity on the quality of the OLS estimators

A
  • Under the OLS assumptions, including homoskedasticity, the OLS estimators E(βj) are the best linear unbiased estimators of the population parameter βj .
  • Under heteroskedasticity the OLS estimators are not necessarily the one with the smallest variance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When two or more of the regressors are highly correlated (but not perfectly correlated), ___

A

When two or more of the regressors are highly correlated (but not perfectly correlated), it is hard to estimate the effect of the one variable holding the other constant.

The higher the correlation between X and X the higher the variance of E(β1). Thus, when multiple regressors are imperfectly collinear, the coefficients on one or more of these regressors will be imprecisely estimated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

overspecification

A

A model that includes irrelevant variables is called an overspecified
model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The OLS estimators are inconsistent if ___

A

The OLS estimators are inconsistent if the error is correlated with any of the independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why adjusted R-squared?

A

The adjusted R-squared is introduced in MLRM to compensate for the increasing R-squared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Adjusted R-squared =

A

Adjusted R-squared =

1 - (n - 1) / (n - k − 1) * SSR / TSS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

SER in MLRM

A

SER = SSR / (n - k - 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Pure vs impure heteroskedasticity

A
  • Pure heteroskedasticity is caused by the error term of a correctly specified equation.
  • Heteroskedasticity is likely to occur in data sets in which there is a wide disparity between the largest and smallest observed values.
  • Impure heteroskedasticity is heteroskedasticity caused by an error in specification, such as an omitted variable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Dummy variable

A

Boolean indicator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly