07 Multiple Regression Model Flashcards

Question 1

Q

Omitted variable bias

Answer

A

The bias in the OLS estimator that occurs as a result of an omitted factor, or variable, is called omitted variable bias. For omitted variable bias to occur, the omitted variable ”Z” must satisfy two conditions:
• The omitted variable is correlated with the included regressor (i.e. corr(Z,X) ̸= 0)
• The omitted variable is a determinant of the dependent variable (i.e. Z is part of u)

E(B1) -[p]-> B1 + corr(Xi, ui) * std(u)/std(X)

The formula indicates that:
• Omitted variable bias exist even when n is large.
• The larger the correlation between X and the error term the larger the bias.
• The direction of the bias depends on whether X and u are negatively or positively correlated.

Question 2

Q

How to overcome omitted variable bias

Answer

A

Ideal controlled experiment
Include the variable in the regression
Do cross tabulation

Question 3

Q

Advantages of the MLRM over the SLRM:

Answer

A

Advantages of the MLRM over the SLRM:
• By adding more independent variables (control variables) we can explicitly control for other factors affecting y.
• More likely that the zero conditional mean assumption holds and thus more likely that we are able to infer causality.
• By controlling for more factors, we can explain more of the variation in y, thus better predictions.
• Can incorporate more general functional forms.

Question 4

Q

Assumptions of the MLRM

Answer

A

Assumptions of the MLRM

Random sampling
Large outliers are unlikely
Zero conditional mean
(There is sampling variation in X) and there are no exact linear relationships among the independent variables (No perfect collinearity).
(The model is linear in parameters)

Under these assumptions the OLS estimators are unbiased estimators of the population parameters. In addition there is the homoskedasticity assumption which is necessary for OLS to be BLUE.

Question 5

Q

Important properties of the OLS fitted values and residuals

Answer

A

The OLS fitted values and residuals have the same important properties as in the simple linear regression:
• The sample average of the residuals is zero and so avg(Y) = avg(E(Y))
• The sample covariance between each independent variable and the OLS residuals is zero. Consequently, the sample covariance between the OLS fitted values and the OLS residuals is zero.
• The point (X ̄ , X ̄ , …, X ̄ , Y ̄ ) is always on the OLS regression line.

Question 6

Q

Consequences of heteroskedaticity on the quality of the OLS estimators

Answer

A

Under the OLS assumptions, including homoskedasticity, the OLS estimators E(βj) are the best linear unbiased estimators of the population parameter βj .
Under heteroskedasticity the OLS estimators are not necessarily the one with the smallest variance.

Question 7

Q

When two or more of the regressors are highly correlated (but not perfectly correlated), ___

Answer

A

When two or more of the regressors are highly correlated (but not perfectly correlated), it is hard to estimate the effect of the one variable holding the other constant.

The higher the correlation between X and X the higher the variance of E(β1). Thus, when multiple regressors are imperfectly collinear, the coefficients on one or more of these regressors will be imprecisely estimated.

Question 8

Q

overspecification

Answer

A

A model that includes irrelevant variables is called an overspecified
model.

Question 9

Q

The OLS estimators are inconsistent if ___

Answer

A

The OLS estimators are inconsistent if the error is correlated with any of the independent variables.

Question 10

Q

Why adjusted R-squared?

Answer

A

The adjusted R-squared is introduced in MLRM to compensate for the increasing R-squared.

Question 11

Q

Adjusted R-squared =

Answer

A

Adjusted R-squared =

1 - (n - 1) / (n - k − 1) * SSR / TSS

Question 12

Q

SER in MLRM

Answer

A

SER = SSR / (n - k - 1)

Question 13

Q

Pure vs impure heteroskedasticity

Answer

A

Pure heteroskedasticity is caused by the error term of a correctly specified equation.
Heteroskedasticity is likely to occur in data sets in which there is a wide disparity between the largest and smallest observed values.
Impure heteroskedasticity is heteroskedasticity caused by an error in specification, such as an omitted variable.

Question 14

Q

Dummy variable

Answer

A

Boolean indicator

07 Multiple Regression Model Flashcards

(14 cards)