Lecture 5 (Chapter 5.6-) Flashcards

Question 1

Q

What does the assumption:
Cov(ut, xt) = 0 mean?

Answer

A

This assumption means that the error term (Ut) and the explanatory variables (Xts) should be uncorrelated

In essence this means that the variable X being used to predict Y should not be related to any of the factors captured by the error term (ut)

This ensures that the relationship between x and y is not contaminated by outside factors

Question 2

Q

What is endogeneity?

Answer

A

This means 𝑥𝑡 is not completely “external” or independent; instead, it’s influenced by factors that also affect the dependent variable 𝑦𝑡.

For instance,
If 𝑥𝑡 and 𝑢𝑡 are positively correlated, the regression model will overstate the effect of 𝑥𝑡 on 𝑦𝑡

And vice versa if xt and ut are negatively correlated.

Question 3

Q

What is multicollinearity?

Answer

A

This is when the explanatory variables are highly correlated with each other

Question 4

Q

What if variables are orthogonal?

Answer

A

Explanatory variables are orthogonal if there is no relationship between them.

This means that adding or removing a variable from the regression equation would not cause the value of coefficients to change

Question 5

Q

What are the problems if ‘near-multicollinearity’ is present but ignored?

Answer

A

R^2 will be high, but individual coefficients will have high standard errors
- This is because the overall model might explain a lot of the variatio in the dependent variable, but the estimated effects of individual predictors will be unreliable because the standard errors are inflated.

Regression becomes very sensitive to small changes in the specification
- If we add or remove a variable or slightly change the dataset, we may drastically change the estimated coefficients, making the model unstable or difficult to interpret.

Confidence intervals for parameters will be very wide and significance tests might give inappropriate conclusions
- Wide confidence intervals indicate high uncertainty about the true values of the coefficients, and as a result, statistical significance tests may incorrectly label some predictors as insignificant.

It makes it difficult to draw concrete inferences:
- Due to the inflated standard errors and instability of the coefficients, it’s hard to determine which parameters truly influence the dependent variable

Question 6

Q

What are some possible solutions to multicollinearity?

Answer

A

-Ignore the problem as OLS will still be BLUE
-Drop one of the collinear variables
-Transform the highly correlated variables into a ratio
-A longer run of data or higher frequency of data
-Use panel data

Question 7

Q

What is the meaning of the Assumption
ut ∼ N(0, σ2)

Answer

A

This means that the error terms of the regression are normally distributed

Question 8

Q

What is the Bera-Jarque test and how is it conducted?

Answer

A

The Bera-Jarque test, tests whether the disturbances are normally distributed or not.

Question 9

Q

Please provide the formula for the BJ test and explain the elements

Answer

A

Bera-Jarque tests for skewness and kurtosis amongst the error terms

Skewness:
b1 = E[u^3]/(σ^2)^(2/3)

Kurtosis:
b2 = E[u^4]/(σ^2)^2

Test stastistic is given by:
W = T* [((b1^2))/6 +((b2-3)^2)/24]

Test statistic follows chi-square(2) under H0: Symmetric and mesokurtic

Question 10

Q

You are given the following BJ test results, please describe what you see:

Observations:
252

Skewness:
-2.38462

Kurtosis:
11.2452

Jarque-Bera:
1004.518

Probability:
0.00000

Answer

A

The BJ tests under the null hypothesis:

H0:
Skewness = 0
Kurtosis = 3

We see from these results that the distribution of the disturbances is negatively skewed due to the negative skew value and leptokurtic as the kurtosis exceeds 3.

In order to NOT reject the null hypothesis the results would have to be insignificant with a p-value exceeding 5%. Due to the low p-value of 0.000 we can reject the null hypothesis.

Due to the highly leptokurtic results and the extremely low p-value there is a chance that our inferences could be wrong, however due to the large sample size of 252 it is unlikely in this case

Question 11

Q

What should we do if the assumption around normally distributed error terms is violated?

Answer

A

A good alternative is to use dummy variables

Question 12

Q

How can we test whether the CLRM is linear

Answer

A

We use the RESET test

Question 13

Q

What is the RESET test?

Answer

A

The RESET test is used to test whether the CLRM is in fact linear. We conduct it by running the auxiliary regression of our original regression.

We obtain R^2 from the regression and use it to find the test statistic:

TR^2, where T = sample size

We run this test statistic against Chi-square(p-1) where p is the number of higher powers in the auxiliary regression.

Question 14

Q

Suppose you are modeling house prices (Yt) based on square meters (X1t) and number of bedrooms (X2t). Based on a sample size of 100.

You run the auxiliary regression and obtain:

p = 3

R^2 = 0.08

Please run RESET test and interpret the results

Answer

A

H0: Functional form is correct

H1: Functional form incorrect

T = 100
R^2 = 0.08

TR^2 = 8

Chi-square (p-1): Chi-square(2) = 5.991

Test statistic > 5.991 so we reject the null hypothesis as there is evidence the functional form is incorrect

Question 15

Q

What is a limitation of the RESET test if the functional form is found to be inappropriate?

Answer

A

If we decide to switch to a non-linear model, the RESET test provides no guide for what a better specification might be

Question 16

Q

What are the consequences of omitting an important variable or including an irrelevant one?

Answer

Study These Flashcards

A

If we omit an important variable, the estimated coefficients on all the other variables would be biased and inconsistent unless the excluded variable is uncorrelated with all the included variables. Even in this condition is satisfied, constant term will be biased, and standard errors will be biased upwards. Thus, we would obtain biased forecasts from the model.

If we include an irrelevant variable: Coefficient estimates will still be consistent and unbiased, but the estimators will be inefficient.

Question 17

Q

Conduct a Chow test on the following data:

Y=β0+β1X1+β2X2+u

RSS = 500
RSS1 = 250
RSS2 = 200

Number of sample observations (T) = 100

Number of parameters (k) = 3

Answer

Study These Flashcards

A

H0: The coefficients are constant across the subsamples (a1 = a2) and (b1=b2)

H1: The coefficients differ across the subsamples

Compute the test statistic:
Test statistic =
(RSS-(RSS1+RSS2))/(RSS1+RSS2) * (T-2k)/k

500-450 / 450 * 94/3 = 3.481

Compare against F(k, T-2k) at 5%
F(3,94): ~2.7

As 3.481> 2.7 we can reject the null hypothesis

Question 18

Q

Please describe the predictive failure test and why you would use this over the Chow test

Answer

Study These Flashcards

A

The chow test has the problem that there may not be enough available data (number of observations) to split it into two subsamples.

In the predictive failure test, we split the data into a ‘long’ sub-period i.e. most of the data.

To calculate the test:
Run the regression for the whole period and obtain RSS

Run the regression for the large sub-period and obtain RSS1

Use number of observations in the large sub period (T1) and the remaining smaller sub sample (T2) to calculate the test statistic:

Test statistic = (RSS-RSS1)/RSS1 * (T1-k)/T2

Test statistic will follow: F(T2,T1-k)

Question 19

Q

Please conduct the predictive failure test of the following output:

Whole sample:
T = 144
RSS = 0.0434

Large sub-sample:
T1= 120
RSS1 = 0.0420

Answer

Study These Flashcards

A

H0: Coefficeints remain constant across the two periods

H1: Coefficients differ across the two periods

Test statistic:
(0.0434-0.0420)/0.0420 * (120-2)/24 = 0.1639

Compare with:
F(24, 118) ~ 1.6

As 0.1639<1.6 we fail to reject the null hypothesis and conclude that the model did not suffer from predictive failure

Lecture 5 (Chapter 5.6-) Flashcards

(19 cards)