coochie balls (week 9-13) Flashcards

1
Q

What is heteroskedasticity?

A

Assumption MLR.5 is the assumption of homoscedasticity where the error variance for all observations of x are the same. If MLR.5 fails, we have heteroskedasticity, where the error term variance varies across observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

With heteroskedastic errors, the OLS coefficients are still unbiased and inconsistent, true or false and why?

A

True because we only need MLR.1-4 to establish unbiasedness and consistency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why should we not use the usual error formula for OLS standard errors when there is heteroskedasticity?

A

Because those formulas rely on homoscedasticity, and if used, will lead to biased estimates of errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
Bias invalidates…
A) t-statistics 
B) f-statistics 
C) Confidence intervals 
D) A and b 
E) All of the above
A

D) A and b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which of the following is true about heteroskedasticity-robust inferences after OLS estimation?

A) All formulas are only valid in large samples
B) Using these formulas, the usual t-test is valid asymptotically
C) All of the above
D) None of the above

A

C) All of the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is true about heteroskedasticity-robust errors compared to OLS standard errors?

A

They may be smaller or larger than OLS standard errors, but the differences are often small in practice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If heteroskedasticity is present, OLS is no longer BLUE

True
False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

With small samples, if MLR.5 holds, what does that mean for the t-statistics in terms of their distribution as opposed to if MLR.5 doesn’t hold?

A

They all have the same t-distribution. Otherwise, the t-distributions will vary for each t-stat.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 4 ways by which we can detect heteroskedasticity?

A
  1. Use the economic nature of the topic, and prior research to understand what may be expected in the data
  2. Plot the residuals
  3. Breusch-Pagan test
  4. White test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we plot residuals to detect heteroskedasticity?

A
  1. Plot u^i or u^i(^2) against the fitted values to see if u^i(^2) are related to the mean value of y
  2. Plot u^i or u^i(^2) against each explanatory variable xi to see which x’s are related to the residuals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do the null and alternative hypotheses look like to test for heteroskedasticity?

A

H0: homoskedastic
Ha: not homoskedastic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Breusch-Pagan test?

A

Regressing the square of the residuals on all the x’s, then using the R^2 to perform an F-test using the auxiliary regression, using the squared residual as the dependent variable in the auxiliary regression to the test using the f-test or LM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In a Breusch-Pagan test, what is the significance of having a high R^2?

A

A large test statistic = a large R^2 means stronger evidence against the null hypothesis of homoskedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an alternate test stat for the Breusch-Pagan test?

A

The Lagrange Multiplier (LM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a limitation of the Breusch-Pagan test?

A

It will only detect linear forms of heteroskedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the white test?

A

The white test allows to test for non-linearities by using squares and cross-products of all x’s, testing them with joint significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the weaknesses of the White test?

A
  1. In a model with 6 x variables, the White regression could have 27 regressors
  2. Large number of regressors uses up degrees of freedom.
  3. Difficult to carry out with smaller n
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the modified white test?

A

Squaring the fitted values in the regression, the test of heteroskedasticity is done by estimating an auxiliary model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the steps in conducting a modified white test?

A
  1. Estimate the original regression and obtain the residuals u^ and fitted y^
  2. Use the squared residuals from step 1 as the dependent variable in an auxiliary regression
  3. Obtain the R^2 from this regression
  4. Find the f-stat or LM stat and test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do you resolve issues of heteroskedasticity?

A
  1. Estimate the model by OLS and calculate robust standard errors
  2. Use an alternative estimator like GLS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is wrong with estimating the model using OLS to calculate robust standard errors to resolve heteroskedasticity?

A

This is highly inefficient, but it’s still unbiased and consistent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is Generalised Least Squares (GLS)?

A

GLS is the BLUE estimator in the presence of heteroskedasticity, also known as WLS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Does the misspecification of h(x) cause the WLS estimator to be biased or inconsistent?

A

WLS is still unbiased and remains consistent under MLR.4 but the standard errors of the estimator will be invalid

24
Q

Usual WLS standard errors and test statistics are no longer valid if there is a misspecification of h(x), so it’s better to use robust standard errors

True
False

A

True

25
Q

What does endogeneity violate?

A

MLR.4 - Zero Conditional Mean

26
Q

Given omitted variables bias (or unobserved heterogeneity), x and u will be correlation so OLS is biased and inconsistent

True
False

A

True

27
Q

What is an instrumental variable?

A

A variable used to account for unexpected behaviour between variables when there is endogeneity

28
Q

What is the IV approach?

A
  1. leaves the unobserved variable in
    the error term
  2. estimates the model parameters consistently
    using some additional information - a new
    variable = the instrument
  3. uses the component of the explanatory
    variable of interest (x) that is uncorrelated
    with the error term (u)
29
Q

What are the requirements for an IV?

A

IV has to high correlated with the endogenous variable → instrument relevance
IV has to be uncorrelated with u (exogenous variable) → instrument validity

30
Q

OLS will be consistent ONLY if exogeneity holds

True
False

A

True

31
Q

IV estimators can be consistent and asymptotically normal

True
False

A

True

32
Q

IV estimators can have substantial bias in small samples, so we prefer large samples

True
False

A

True

33
Q

How do you do a test for IV condition 1 (correlation between IV and endo variable)?

A
  1. Run the regression with an explanatory variable as the dependent: x = Y0 + Y1z + v where Z is the IV
  2. Test H0: Y1 = 0, Ha: Y1 =/= 0
  3. We want Y1 to be large and highly significance for instrumental relevance
34
Q

What happens if the IV is invalid or weak?

A

May be even more inconsistent than OLS

35
Q

If R^2 is small, and IV is a weak instrument, what does this mean for the correlation between x and z, the IV standard errors and why?

A

The x and z variables are only slightly correlated and this will lead to high levels of IV standard errors

36
Q

What is a structural model?

A

Y variables represent endogenous variables and z variables are exogenous

37
Q

What test do you use for endogeneity?

A

Hausman test

38
Q

With a measurement error in an explanatory variable, what are the consequences?

A
  1. OLS is biased and inconsistent because the mismeasured variable is endogenous
  2. Attenuation bias, meaning the magnitude of the effect will be attenuated towards zero
  3. Effect of the other explanatory variables will be biased
39
Q

If a second measurement of the mismeasured variable is available, this can be used as an IV for the mismeasured variable.

True
False

A

True

40
Q

What is 2SLS?

A
  1. Form the prediction of y based on the OLS regression of the reduced form
  2. Substitute all the fitted values into the structural model and then get the IV coefficient
41
Q

If there is one endogenous variable and one instrument then…

A

2SLS = IV

42
Q

Why does 2SLS work?

A
  1. All variables in 2SLS are exogenous because the dependent variable in the reduced form regression was replaced by a prediction based on only exogenous information
  2. By using that prediction, the dependent variable is rid of its endogenous part
43
Q

2SLS/IV is typically much less precise because there is more multicollinearity and less explanatory variation in the second stage regression

True
False

A

True

44
Q

Which of the following are true:

A) After 2SLS, it is not possible to test for heteroskedasticity
B) After 2SLS, you can test and correct for heteroskedasticity
C) 2SLS works with time series, pooled cross sections and panel data
D) B and c and correct
E) None of the above are correct

A

D) b and c are correct

45
Q

What is pooled cross-sectional data? And what are they used for?

A
  1. Two or more cross sections are combined in one data set
  2. Cross sections are drawn independently of each other
  3. Often, they are used to evaluate policy changes
46
Q

What is panel data and what is it used for?

A
  1. The same cross-sectional units are followed over time
  2. Panel data have both cross-sectional and time series dimensions
  3. Panel data is used to account for time-invariant unobservables and to model lagged responses
47
Q

What are the advantages of using pooled cross sectional data?

A
  1. Increases sample size

2. Improves the accuracy of estimators and the performance of tests

48
Q

What are the disadvantages of using pooled cross sectional data?

A
  1. Need to use a slightly more complicated model to allow for the possibility that the population distribution may have changed over time
  2. Error variance may change over time
49
Q

If there is heteroskedasticity in the error term in pooled cross sectional data, what does this mean for error variance?

A

Error variance may change over time even if it does not change with the other explanatory variables

50
Q

What is Difference in Differences (DID)?

A

Quasi-experimental approach that compares the changes in outcomes over time between a population enrolled in a program (the treatment group) and a population that is not (the comparison group). This is done to estimate the treatment effect.

51
Q

What does the process of DID with pooled cross sections look like?

A
  1. Allocate the control group C and treatment group T
  2. Create a dummy for GroupT=1 (=0 for control group)
  3. Create a dummy for ‘before’ period 1 and ‘after’ period 2 such as D2=1 for period 2 (=0 for period 1)
  4. Thus, the impact of a policy change on the outcome variable y will be determined by:

Y = B0 + B1GroupT + Delta0D2 + Delta1GroupT*D2 + u

52
Q

How is the impact of the policy determined in terms of the averages of control and treatment group?

A

The impact of the policy is equal to the difference in the average outcome for the T and C groups in the after period, after subtracting the difference in average outcomes for the two groups prior to the policy change

53
Q

In a DID for pooled cross sectional data analysis, what is the impact of adding more explanatory variables to the regression?

A

Allows for the populations sampled to differ over the two time periods. Interpretation of Delta1 is the same, but the simple expression for the DID in means no longer applies.

54
Q

What happens if omitted variables are constant over time?

A

They form part of a composite error term.

55
Q

How does panel data help fix endogeneity issues?

A

Due to unobservable time-invariant individual-specific effects, the unobservable time-constant heterogeneity is ‘differenced’ away.