Lecture 4 (Chapter 5) Flashcards

1
Q

What are the five assumptions underlying the Classical Linear Regression Model? (Explain each in brief detail)

A

E(ut) = 0, Errors have zero mean. As some error terms will be slightly above zero and some slightly below it should average out to 0.

Var(ut) = σ^2, The variance of the errors is constant and finite. This meanss that regardless of whether X=2 or X=8 we expect the variance of the error terms (distance from reg. line) to be around the same.

Cov(ui,uj) = 0 (i≠j), The errors are statistically independent of one another. The errors are not correlated to one another. If they are correlated they are said to be autocorrelated.

Cov(ut,Xt) = 0, No relationship between the error and variable X

ut is normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What happens if one or more of the CLRM assumptions are violated?

A
  • The coefficient estimates are wrong
  • The associated standard errors are wrong
  • The distributions that are assumed for the test statistics are inappropriate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Var(ut) = σ^2

Please explain the meaning of this assumption and what it means if it is violated

A

This assumption is known as the assumption of homoscedasticity. It means that the variance of the error terms remains constant. (This is an average, i.e. that the variance is within a constant spread)

If this is not the case, the error terms are said to be heteroscedastic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Detection of heteroscedasticity:
Goldfeld-Quandt test

A

GQ test:
-Split the sample size into two equal sub-samples of equal length T1 and T2.

Null hypothesis is, H0: σ1^2 = σ2^2

The test statistic is the ratio of the two variances with the larger of the two variances placed in the numerator

GQ = S1^2 / S2^2 (typically you are given s)

The test statistic is distributed as an F(T1-k, T2-k) under the null of homoscedasticity.

(There may be the problem is knowing where to split the sample)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

EXAMPLE GQ TEST:
You are testing for heteroscedasticity of a simple linear reg model. y=b0 + b1x + u.

Your sample consists of 20 observations. T=20

After splitting the sample into two subgroups (10 each) you are given:
s1^2 = 4
s2^2 = 2

Perform the GQ test with a 5% significance level under the null of homoscedasticity (H0: σ1^2 = σ2^2)

A

Test statistic:
GQ = S1^2 / S2^2 = 4/2 = 2

Degrees of freedom:
F(T1-k, T2-k)
T1 = 10, T2 = 10, k = 2 (as 2 parameters)

Therefore, F(8,8) - F(Critical) = F(8,8 at 5% significance) = 3.44

Because 2 is lower than 3.44 we fail to reject the null hypothesis at 5% level of significance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Other detection of heteroscedasticity method:
White’s general test

A

Run the regression:
yt= B1 + B2X2t + B3X3t + Ut

Then run auxiliary regression:
ut(sample u with the hat) = a1 + a2x2t + a3x3t + a4x2t2^2 + a5x3t^2 + a6x2tx3t + vt

The test statistic is R^2 from the auxiliary regression multiplied by the number of observations T.

TR^2 = Chi-squared(m), where m is the number of regressors if you exclude the constant term.

If test statistic is greater than critical value from chi-square table you reject the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

EXAMPLE WHITE’S TEST:
You are estimating the regression model:
yt= B1 + B2X2t + B3X3t + ut, with 50 observations (T=50)

You run the auxiliary regression: ut(sample u with the hat) = a1 + a2x2t + a3x3t + a4x2t2^2 + a5x3t^2 + a6x2tx3t + vt.

For aux reg: R^2 = 0.25

Use White’s test to determine heteroscedasticity at 5% significance level.

A

Test statistic is:
T = 50
R^2 = 0.25
TR^2 = 12.5

The aux. reg includes 5 regressors, so m = 5

Look up 5 in chi-sqare table = 11.07

Because 12.5 > 11.07 we reject the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the consequences of heteroscedasticity

A

-Unconditional heteroscedasticity does not impose any serious problems with OLS regressions. This is because it is heteroscedasticity not related to the explanatory variables

-Conditional heteroscedasticity means the variance of errors depends on the values of explanatory variables. If ignored:
OLS estimates for coefficients remain accurate on average (unbiased and consistent).
However, these estimates are no longer the most precise (they don’t have the smallest possible variability, so they’re not BLUE).
In short, the estimates are still correct, but they’re less reliable because they aren’t as precise as they could be.

-Standard errors may be misestimated:
Intercept’s standard error: Too large, making it harder to detect statistical significance.
Slope’s standard error: Too low if the error variance increases with the size of an explanatory variable.
This underestimation increases the risk of a Type I error (falsely rejecting the null hypothesis).

-With incorrect standard errors, hypothesis tests and confidence intervals become unreliable, potentially leading to invalid conclusions.

In summary, heteroscedasticity doesn’t bias coefficient estimates but compromises efficiency (not BLUE) and makes inferential statistics (like p-values) less reliable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Assumption 3:
Cov(ui,uj) = 0

Please describe what this assumption means

A

This assumption relates to the residual error terms. It assumes that the error terms are uncorrelated with one another. Hence Cov(ui,uj) = 0.

If this is not the case, the terms would be said to be autocorrelated. Meaning error terms are related to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a lagged value?

A

The lagged value of a variable is the value that the variable took during a previous period.

e.g. Yt-1 denotes the value of Yt lagged one period

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is the first difference of Y calculated?

A

∆yt = yt – yt-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Durbin-Watson test? Also, provide its formula.

A

DW Test is a test for first order of autocorrelation - i.e. it tests only for a relationship between an error term and its immediately previous value.

ut = 𝜌ut-1 + vt, (vt being a random error term)

H0: corr = 0, H1: corr ≠ 1

Test statistic: DW = (check notes)

Also, dw roughly equals to
DW = 2(1-𝜌) This is sample correlation

If DW = 2, zero autocorrelation, do not reject the null hypothesis

DW = 0, perfect positive autocorrelation, reject the null hypothesis

DW = 4, perfect negative autocorrelation, reject the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which conditions must be met in order to conduct a DW test?

A

Three conditions must be met to fulfil a DW test:

-There must be a constant term in the regression
-The regressors must be non-stochastic
-There must be no lags of dependent variable in the regression

*Remember with that, that DW test is only a first order autocorrelation test - it cannot test further than that

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Please outline the Breusch-Godfrey Test

A

This is a test for higher order autocorrelation:

H0: 𝜌1 = 0, 𝜌2 = 0, 𝜌r = 0
H1: 𝜌1 ≠ 0, 𝜌2 ≠ 0, 𝜌r ≠ 0

Test statistic:
(T-r)R^2 = …

T: number of observations.

r: number of lags considered

R^2: the coefficient of determination from the auxiliary regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Breusch-Godfrey test example:

Suppose you are analyzing the relationship between a dependent variable Yt = B0 + B1X1 + ut

You have 50 observations in your sample.

You pick r to = 2, to test for higher order autocorrelation

R^2 from auxiliary regression is = 0.2

Do we have autocorrelation?

A

H0: 𝜌 = 0 H1: 𝜌 ≠ 0

Test statistic:
(T-r)R^2
(50-2)*0.2 = 9.6

Look for X^2(r) in Chi square table

X^2(2) at 5% significance = 5.991

As 9.6>5.991 we can reject the null hypothesis as there is evidence of autocorrelation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the consequences of ignoring autocorrelation?

A
  1. Unbiased but inefficient coefficient estimates
    - This means that the OLS method will still produce unbiased estimators (meaning they are correct on average)
    - They are, however, not efficient. Meaning they do not have the minimum variance possible
  2. Incorrect standard errors:
    - Autocorrelation affects the calculation of standard errors. If these are underestimated or overestimated, hypothesis testing (e.g., t-tests and F-tests) becomes unreliable.
    - (This leads to point 4)
  3. Inflated R^2 with positive autocorrelation
    - Positive autocorrelation (where errors are positively correlated with previous errors) artificially increases the R^2 making the model appear better at explaining variability in the dependent variable than it actually is
  4. Increased type I error
    - Positive autocorrelation leads to underestimated standard errors. Smaller standard errors make it more likely to incorrectly reject the null hypothesis (Type I error), concluding that an effect is significant when it is not.
  5. Type II error in case of Negative autocorrelation
    - Negative autocorrelation (where errors are negatively related to previous ones) results in overestimated standard errors. Larger standard errors make it harder to reject the null hypothesis, even if it is false, increasing the likelihood of Type II error.