Week 4 Flashcards

1
Q

Define Multicollinearity

A

High correlation between at least two independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When a (multiple) regression model has a multicollinearity issue, what happens? (3)

A

The goal of a multi-regression model is to measure the marginal effects of one independent variable on the dependent variable following the ceteris paribus assumption (all other variables remaining constant).
When 2 independent variables are highly correlated (such as age and experience), this also means that they move together.
This means there is no opportunity to disentangle their effects. The individual effect becomes obscured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you detect a multicollinearity problem? (2)

A
  • Check correlation coefficients: Use a correlation matrix for all independent variables (to detect if there is a correlation between independent variables)
    ○ If the correlation coefficients > 0.7 –>
    Signal multicollinearity (more cautious:
    0.5)
  • Variance inflation factor (VIF)
    ○ Measures the linear association between IV and all the other IV’s.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe the Variance Inflation Factor (VIF)

A

Variance inflation factor (VIF):
Measures the linear association between IV and all the other IV’s:
§ Quantifies the severity of multicollinearity
§ VIF varies takes a value of 1 and above (no upper limit)
§ VIF value shows the percentage the variance, inflated for each coefficient
–> i.e. VIF of 1.7 -> Variance is 70% bigger in the data, compared to no multicollinearity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you detect multicollinearity with a correlation matrix?

A

○ If the correlation coefficients > 0.7 –> Signal multicollinearity (more cautious: 0.5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you detect multicollinearity with the results from the Variance Inflation Factor (VIF)? (3)

A
  1. No Multicollinearity:
    - VIF = 1 –> No correlation between a given IV and other IV’s in the model.
    1. Moderate Multicollinearity:
      • VIF between 1 and 5 –> Not severe, no need to pay special attention.
    2. Severe Multicollinearity:
      a. Cautious: VIF > 5
      b. Less cautious: VIF > 10
      Multicollinearity is likely a problem in the regression model and can effect your estimates.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can you deal with multicollinearity? (3)

A
  1. Increase sample size - however, may not be feasible
  2. Drop one of the variables that causes the problem
    • First estimate the model with one variable, then with the other –> Important to consider how dropping the variable may impact the study.
  3. Transform the highly correlated IVs
    • Log transformation (if collinear relationship is non-linear or exponential)
      ○ Can be checked with VIF before and after
    • Create a composite variable, combine collinear IVs.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe what a composite variable is:

A
  • A composite variable is one which is made of two or more variables which are highly correlated conceptually or statistically.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

OLS assumes homoscedasticity. Define homoscedasticity (2)

A
  • Variance of the error term is constant over various values of the IVs.
  • Dispersion of the error remains the same over the range of observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define heteroscedasticity (3)

A
  • The error term does not have a constant variance.
  • Variance changes in response to a change in the value of IV’s.
  • Dispersion of the error changes over the range of observations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some issues when using OLS with heteroscedasticity? (4)

A
  • OLS assumption (of constant error variance) violated.
  • Biased standard errors
  • Unreliable t-statistics
  • Unreliable significance tests
    –> misleading conclusions about significance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can we detect heteroscedasticity? (3)

A
  1. Breusch-Pagan test
  2. White test
  3. Scatterplot of residuals (for each independent variable)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can you deal with heteroscedasticity? (3)

A
  • Transform the dependent variable (doesn’t always work)
  • Use weighted regression
    • Each observation is weighted
    • Observations with a higher variance get a lower weight in determining the regression coefficients.
  • Use ‘robust; standard errors
    • Adjusts the OLS standard errors for heteroscedasticity.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Reverse Causality?

A

When pursuing a study, we usually assume that changes in the dependent variable are caused by changes in the
independent variable(s). However, reverse causality occurs when the dependent variable also causes a change in the independent variable.
(This is a form of endogeneity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can you account for Reverse Causality? (5)

A
  • Have a model that is well-grounded in theory
  • Explain the mechanism with strong reasoning behind the “how” and “why”
  • Acknowledge possible endogeneity issues
  • Use lagged independent variables
  • Advanced econometric techniques to mitigate endogeneity.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define omitted variable bias (3)

A

This is where a relevant variable that influences both the dependent variable (1) and one (or more) independent variables (2), is left out of the model (3).
This can create misleading results as the omitted variable gets “mixed into” the effects of the included variables, distorting the influence of the variables that are included.

17
Q

How can you deal with the Omitted Variable Bias? (2)

A
  • Avoid simple regression models (i.e. 1 Independent Variable)
  • Include variables that are mostly likely to be the most important theoretically in explaining the DV.
18
Q

What does the value of the ‘mean’ represent in the case of dummy variables?

A

This represents the % of cases where the dummy variable takes the value 1.

19
Q

When correcting for heteroscedasticity, what happens to the results? (3)

A
  • Robust standard errors adjust the standard errors for heteroscedasticity.
    –> Standard error, t-values will differ: because significance might change.
  • The regression coefficients and R^2 will remain the same.
  • The F-statistic will differ too because it is a test of the overall significance of the model.
20
Q

Define Causal Inference and challenges associated with not having a time-lag

A

This is the assumption that the independent variable affects the dependent variable.
Not having a time lag creates the possibility of reverse causality, and rather the dependent variable can influence the dependent variable.