Lecture 5 (Chapter 5.6-) Flashcards
What does the assumption:
Cov(ut, xt) = 0 mean?
This assumption means that the error term (Ut) and the explanatory variables (Xts) should be uncorrelated
In essence this means that the variable X being used to predict Y should not be related to any of the factors captured by the error term (ut)
This ensures that the relationship between x and y is not contaminated by outside factors
What is endogeneity?
This means 𝑥𝑡 is not completely “external” or independent; instead, it’s influenced by factors that also affect the dependent variable 𝑦𝑡.
For instance,
If 𝑥𝑡 and 𝑢𝑡 are positively correlated, the regression model will overstate the effect of 𝑥𝑡 on 𝑦𝑡
And vice versa if xt and ut are negatively correlated.
What is multicollinearity?
This is when the explanatory variables are highly correlated with each other
What if variables are orthogonal?
Explanatory variables are orthogonal if there is no relationship between them.
This means that adding or removing a variable from the regression equation would not cause the value of coefficients to change
What are the problems if ‘near-multicollinearity’ is present but ignored?
R^2 will be high, but individual coefficients will have high standard errors
- This is because the overall model might explain a lot of the variatio in the dependent variable, but the estimated effects of individual predictors will be unreliable because the standard errors are inflated.
Regression becomes very sensitive to small changes in the specification
- If we add or remove a variable or slightly change the dataset, we may drastically change the estimated coefficients, making the model unstable or difficult to interpret.
Confidence intervals for parameters will be very wide and significance tests might give inappropriate conclusions
- Wide confidence intervals indicate high uncertainty about the true values of the coefficients, and as a result, statistical significance tests may incorrectly label some predictors as insignificant.
It makes it difficult to draw concrete inferences:
- Due to the inflated standard errors and instability of the coefficients, it’s hard to determine which parameters truly influence the dependent variable
What are some possible solutions to multicollinearity?
-Ignore the problem as OLS will still be BLUE
-Drop one of the collinear variables
-Transform the highly correlated variables into a ratio
-A longer run of data or higher frequency of data
-Use panel data
What is the meaning of the Assumption
ut ∼ N(0, σ2)
This means that the error terms of the regression are normally distributed
What is the Bera-Jarque test and how is it conducted?
The Bera-Jarque test, tests whether the disturbances are normally distributed or not.
Please provide the formula for the BJ test and explain the elements
Bera-Jarque tests for skewness and kurtosis amongst the error terms
Skewness:
b1 = E[u^3]/(σ^2)^(2/3)
Kurtosis:
b2 = E[u^4]/(σ^2)^2
Test stastistic is given by:
W = T* [((b1^2))/6 +((b2-3)^2)/24]
Test statistic follows chi-square(2) under H0: Symmetric and mesokurtic
You are given the following BJ test results, please describe what you see:
Observations:
252
Skewness:
-2.38462
Kurtosis:
11.2452
Jarque-Bera:
1004.518
Probability:
0.00000
The BJ tests under the null hypothesis:
H0:
Skewness = 0
Kurtosis = 3
We see from these results that the distribution of the disturbances is negatively skewed due to the negative skew value and leptokurtic as the kurtosis exceeds 3.
In order to NOT reject the null hypothesis the results would have to be insignificant with a p-value exceeding 5%. Due to the low p-value of 0.000 we can reject the null hypothesis.
Due to the highly leptokurtic results and the extremely low p-value there is a chance that our inferences could be wrong, however due to the large sample size of 252 it is unlikely in this case
What should we do if the assumption around normally distributed error terms is violated?
A good alternative is to use dummy variables
How can we test whether the CLRM is linear
We use the RESET test
What is the RESET test?
The RESET test is used to test whether the CLRM is in facct linear. We conduct it by running the auxiliary regression of our original regression.
We obtain R^2 from the regression and use it to find the test statistic:
TR^2, where T = sample size
We run this test statistic against Chi-square(p-1) where p is the number of higher powers in the auxiliary regression.
Suppose you are modeling house prices (Yt) based on square meters (X1t) and number of bedrooms (X2t). Based on a sample size of 100.
You run the auxiliary regression and obtain:
p = 3
R^2 = 0.08
Please run RESET test and interpret the results
H0: Functional form is correct
H1: Functional form incorrect
T = 100
R^2 = 0.08
TR^2 = 8
Chi-square (p-1): Chi-square(2) = 5.991
Test statistic > 5.991 so we reject the null hypothesis as there is evidence the functional form is incorrect
What is a limitation of the RESET test if the functional form is found to be inappropriate?
If we decide to switch to a non-linear model, the RESET test provides no guide for what a better specification might be