Lecture 4 (Chapter 5) Flashcards
What are the five assumptions underlying the Classical Linear Regression Model? (Explain each in brief detail)
E(ut) = 0, Errors have zero mean. As some error terms will be slightly above zero and some slightly below it should average out to 0.
Var(ut) = σ^2, The variance of the errors is constant and finite. This meanss that regardless of whether X=2 or X=8 we expect the variance of the error terms (distance from reg. line) to be around the same.
Cov(ui,uj) = 0 (i≠j), The errors are statistically independent of one another. The errors are not correlated to one another. If they are correlated they are said to be autocorrelated.
Cov(ut,Xt) = 0, No relationship between the error and variable X
ut is normally distributed
What happens if one or more of the CLRM assumptions are violated?
- The coefficient estimates are wrong
- The associated standard errors are wrong
- The distributions that are assumed for the test statistics are inappropriate
Var(ut) = σ^2
Please explain the meaning of this assumption and what it means if it is violated
This assumption is known as the assumption of homoscedasticity. It means that the variance of the error terms remains constant. (This is an average, i.e. that the variance is within a constant spread)
If this is not the case, the error terms are said to be heteroscedastic.
Detection of heteroscedasticity:
Goldfeld-Quandt test
GQ test:
-Split the sample size into two equal sub-samples of equal length T1 and T2.
Null hypothesis is, H0: σ1^2 = σ2^2
The test statistic is the ratio of the two variances with the larger of the two variances placed in the numerator
GQ = S1^2 / S2^2 (typically you are given s)
The test statistic is distributed as an F(T1-k, T2-k) under the null of homoscedasticity.
(There may be the problem is knowing where to split the sample)
EXAMPLE GQ TEST:
You are testing for heteroscedasticity of a simple linear reg model. y=b0 + b1x + u.
Your sample consists of 20 observations. T=20
After splitting the sample into two subgroups (10 each) you are given:
s1^2 = 4
s2^2 = 2
Perform the GQ test with a 5% significance level under the null of homoscedasticity (H0: σ1^2 = σ2^2)
Test statistic:
GQ = S1^2 / S2^2 = 4/2 = 2
Degrees of freedom:
F(T1-k, T2-k)
T1 = 10, T2 = 10, k = 2 (as 2 parameters)
Therefore, F(8,8) - F(Critical) = F(8,8 at 5% significance) = 3.44
Because 2 is lower than 3.44 we fail to reject the null hypothesis at 5% level of significance.
Other detection of heteroscedasticity method:
White’s general test
Run the regression:
yt= B1 + B2X2t + B3X3t + Ut
Then run auxiliary regression:
ut(sample u with the hat) = a1 + a2x2t + a3x3t + a4x2t2^2 + a5x3t^2 + a6x2tx3t + vt
The test statistic is R^2 from the auxiliary regression multiplied by the number of observations T.
TR^2 = Chi-squared(m), where m is the number of regressors if you exclude the constant term.
If test statistic is greater than critical value from chi-square table you reject the null hypothesis.
EXAMPLE WHITE’S TEST:
You are estimating the regression model:
yt= B1 + B2X2t + B3X3t + ut, with 50 observations (T=50)
You run the auxiliary regression: ut(sample u with the hat) = a1 + a2x2t + a3x3t + a4x2t2^2 + a5x3t^2 + a6x2tx3t + vt.
For aux reg: R^2 = 0.25
Use White’s test to determine heteroscedasticity at 5% significance level.
Test statistic is:
T = 50
R^2 = 0.25
TR^2 = 12.5
The aux. reg includes 5 regressors, so m = 5
Look up 5 in chi-sqare table = 11.07
Because 12.5 > 11.07 we reject the null hypothesis
What are the consequences of heteroscedasticity
-Unconditional heteroscedasticity does not impose any serious problems with OLS regressions. This is because it is heteroscedasticity not related to the explanatory variables
-Conditional heteroscedasticity means the variance of errors depends on the values of explanatory variables. If ignored:
OLS estimates for coefficients remain accurate on average (unbiased and consistent).
However, these estimates are no longer the most precise (they don’t have the smallest possible variability, so they’re not BLUE).
In short, the estimates are still correct, but they’re less reliable because they aren’t as precise as they could be.
-Standard errors may be misestimated:
Intercept’s standard error: Too large, making it harder to detect statistical significance.
Slope’s standard error: Too low if the error variance increases with the size of an explanatory variable.
This underestimation increases the risk of a Type I error (falsely rejecting the null hypothesis).
-With incorrect standard errors, hypothesis tests and confidence intervals become unreliable, potentially leading to invalid conclusions.
In summary, heteroscedasticity doesn’t bias coefficient estimates but compromises efficiency (not BLUE) and makes inferential statistics (like p-values) less reliable.
Assumption 3:
Cov(ui,uj) = 0
Please describe what this assumption means
This assumption relates to the residual error terms. It assumes that the error terms are uncorrelated with one another. Hence Cov(ui,uj) = 0.
If this is not the case, the terms would be said to be autocorrelated. Meaning error terms are related to each other
What is a lagged value?
The lagged value of a variable is the value that the variable took during a previous period.
e.g. Yt-1 denotes the value of Yt lagged one period
How is the first difference of Y calculated?
∆yt = yt – yt-1
What is the Durbin-Watson test? Also, provide its formula.
DW Test is a test for first order of autocorrelation - i.e. it tests only for a relationship between an error term and its immediately previous value.
ut = 𝜌ut-1 + vt, (vt being a random error term)
H0: corr = 0, H1: corr ≠ 1
Test statistic: DW = (check notes)
Also, dw roughly equals to
DW = 2(1-𝜌) This is sample correlation
If DW = 2, zero autocorrelation, do not reject the null hypothesis
DW = 0, perfect positive autocorrelation, reject the null hypothesis
DW = 4, perfect negative autocorrelation, reject the null hypothesis
Which conditions must be met in order to conduct a DW test?
Three conditions must be met to fulfil a DW test:
-There must be a constant term in the regression
-The regressors must be non-stochastic
-There must be no lags of dependent variable in the regression
*Remember with that, that DW test is only a first order autocorrelation test - it cannot test further than that
Please outline the Breusch-Godfrey Test
This is a test for higher order autocorrelation:
H0: 𝜌1 = 0, 𝜌2 = 0, 𝜌r = 0
H1: 𝜌1 ≠ 0, 𝜌2 ≠ 0, 𝜌r ≠ 0
Test statistic:
(T-r)R^2 = …
T: number of observations.
r: number of lags considered
R^2: the coefficient of determination from the auxiliary regression.
Breusch-Godfrey test example:
Suppose you are analyzing the relationship between a dependent variable Yt = B0 + B1X1 + ut
You have 50 observations in your sample.
You pick r to = 2, to test for higher order autocorrelation
R^2 from auxiliary regression is = 0.2
Do we have autocorrelation?
H0: 𝜌 = 0 H1: 𝜌 ≠ 0
Test statistic:
(T-r)R^2
(50-2)*0.2 = 9.6
Look for X^2(r) in Chi square table
X^2(2) at 5% significance = 5.991
As 9.6>5.991 we can reject the null hypothesis as there is evidence of autocorrelation.