Chapter 5 - Classical Linear regression assumptions and diagonistic tests Flashcards

1
Q

Recall the assumptions we make when doing the classical linear regression model

A

E[u_t] = 0 (residuals have expected value 0)
var(u_t) = sigma^2 < infinity
cov(u_i, u_j) = 0
cov(u_i, x_i) = 0
u_t is normally distributed N(0, sigma^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

why do we need those assumptions?

A

They serve 2 primary purposes:
1) In order to show that OLS have a list of certain desirable properties, we needed to make these assumptions

2) Doing so helps us in hypothesis testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what sort of questions are we interested in in this chapter?

A

How can we detect violations of the CLRM assumptions?

What are the most likely causes of violations in practice?

What happens if we choose to ignore a certain assumption, and continue with the model nonetheless?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

name the two test statistic approaches that we use

A

Lagrange multipliers

Wald test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what do we need to know regarding the LM test packages and the Wald test?

A

LM test statistic in the context of diagnostic tests follow chi squared distribution with m degrees of freedom, where m is the number of constraints/restrictions placed on the model.

The wald version of the test follows an F-distribution with (m, T-k) degrees of freedom.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What can we say about comparisons of LM and Wald tests

A

asymptotically, their behavior is the same. But for smaller samples, there will be differences.

This comes as a result of how the F distribtion is naturally related to chi squared, and when the number of sample points T increase, the ratio of the chi squared variables in the F-distribution will make it converge to a regular chi squared variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is actualyl a diagnostic test?

A

a test concerning the validity of a model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

elaborate on the first assumption of CRLM

A

E[u_t] = 0

We assume that in our model, the average error is 0.
Is this reasonable?
For OLS, it is. This is because if we add a constant term to the regression line function, regardless of what the slope is, we can adjust the constant so that the average error can always be obtained as zero.

Possible problems occur when we force the slope through zero because of other reasons than 0-intercept being the constant that gives us mean error equal 0.

if we for some reason remove the intercept term from our model, these problematic cases might occur.

THE BIGGEST thing is perhaps that the sample mean will be better at explaining the variance about the mean as compared with the regression line. this makes R^2 negative, and we we undisirable results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

elaborate on the problemtic cases that may occur as a result of omitting the intercept term in the CRLM

A

First of all, recall that R^2 can be defined as ESS/TSS, explained sum of squares divided by total sum of squares. ESS is simply the variation in the model around the mean level that is captured by the model. TSS includes the variation around the mean, but also the motion that the model does not capture. THe difference is called unexplained sum of squares.

If we remove the intercept term from the regression line, R^2 can be negative. An interpretation of this is that the sample average explains more of the variation in the process than the explanatory variables are able to do.

An even more negative consequence of not including the intercept, is that we risk having severe bias in the slope of the hyperplane. The actual relationship may be perfectly linear, but offset at a certain position in y. Without an intercept, we cannot model this correctly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

elaborate on homoscedasticity

A

constant variance assumption (error terms have constant variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what do we say if the second assumption (constant variance of residuals) is violated?

A

The variance of the error terms is heteroscedastic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how can we test for heteroscedasticity?

A

The simplest method is “Goldfeld-Quandt” test.

Roughly speaking, the Goldfeld-Quandt test is about dividing the sample T into two subsamples. Then we compute the variance of each subsample. These random variables are chi squared etc.

Therefore, we can use the F-distribution to test whether the two variances are equal or not. The null hypothesis is that the variance is the same for both.

AS with the regular F-test, this test is about checking the ratio of the variances. If they are actually the same, the value should be within the critical region etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

is the Goldfeld-Quandt test good?

A

It is decent, but has weaknesses. Most of the weaknesses is related to being dependent on the splitting point.

Shit can be done. For instance, some sort of shifflung not by using time, but by using a third variable that creates effects on the variance.

A second approach is to omit the center of the original sample T. This creates a separation space that can be beneficial.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Do we have alternatives to the Goldfeld-Quandt test for testing heteroscedasticity?

A

“White’s test”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

elaborate on “White’s test”

A

We are testing for heteroscedasticity. Therefore, we want to see if the variance of the residuals can be explained in a way that would make it be suvbject to vary depending on the variables we use as explanatory variables etc.

We first obtain the residuals from regular regression.
Then we want to see whether the squared residuals (from our sample, essentially) can be described “additionally” (than a mean, constant) by our explanatory variables, squares of our explanatory variables, and cross products of the explanatory variables. Each of these terms have a parameter coefficient which is what White’s test is really interested in.

To understand why we square the independent variable (squared residuals we found from regular regression), we consider the formula for variance of the residuals:

var(u_t) = E[(u_t - E[u_t])^2]

var(u_t) = E[(u_t - 0)^2]

var(u_t) = E[u_t^2]

We find the expected value from the regression, but we have to square the residuals first. This is why we use squared residuals as the independent variable in the auxiliary regression.

After we perform the auxiliary regression, we have a choice of testing method. We could use the regular framework of F-distribution and F-test, where we’d find the RSS from restricted and unrestricted models etc. This would be done by considering a regression with only a constant as reressor (no variables).This would be the restricted model, while the other auxiliary regerssion is the unrestricted.

However, it is perhaps easier to use the lagrange approach. this centers on the value of R^2. Recall R^2 as explained variance divided by total variance. The idea is that if the auxiliary regression get a high R^2 value, it means that the variables are good predictors of the variance. This is “bad” because it would indicate that the variance is actually not constant, it correlates with variable value. So, we’d obtain the R^2 value, multiply by T, and then it can be shown that this follow a chi-squared distribution with m degrees of freedom, where m is the number of regressors in the auxiliary regression, excluding the constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what happens if the errors are heteroskedastic but we carry on none-theless

A

The estimators are still unbiased, but they are no longer the best we can get. The variance will be off.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Elaborate on dealing with heteroscedasticity

A

if the form of the heteroscedasticity is known, we can use GLS.

GLS can be considered weighted least squares. This is because for GLS, we are minimizing the weighted sum of squared residuals, whereas with OLS it is simply an unweighted sum of squared residuals that we are minimizing.

Regarding “the form of heteroscedasticity”: We are talking about knowing the variance as a funciton of something. For instance, if the variance of residuals is sigma^2 times z_t^2, we could remove the heteroscedasticity by dividing on z_t (all terms) in the regression model. This creates a weighted sum instead.

Other ways of dealing with heteroscedasticity is to transform the data into soemthing that makes the variance more constant. Perhaps log transfgorm. Logging has the effect of pulling down extreme values, and thus reducing variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what do we typically say if the residuals does not satisfy cov(uj, ui) =0?

A

They are not uncorrelated, which means that they are autocorrelated.

Another term for autocorrelation is “serially correlated”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

if we use differneces in variable values ∆y_t instead of the og values, what is important to remebmer?

A

We lose the first value. The first difference use the first and the second value, and the final difference use the final value and the second to final value. The result is that our sequence of values shrink by removal of the first value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is the first thing we can do to check for validity of cov(uj, ui)=0 assumpiton?

A

We fit hte model and acquire the residuals. then we plot the residuals against each other using the lag components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

When we plot the residuals’ lag components against each other, what do we wish to see?

A

A random scatter plot with no pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

In practice, do we use the residual plot to make any decisions?

A

The plot is a first-phase sort of thing. it doesnt really provide any information other than intuition on the data.

The simplest thing we can do that is more professional, is the Durbin-Watson test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

elaborate on the Durbin-Watson test

A

A Durbin-Watsen (DW) test is a test for first-order autocorrelation.

First order autocorrelation means that it will only consider the first immediate lag variable of residual.

The null hypothesis will be that the autocorrelation between residual at time step t and residual at time step t-1 is 0. The alternative hypothesis is the two-tailed option of lag-1 autocorrelation being not zero.

The test statistic (the DW test statistic) is defined as:

DW = (∑(u_t - u_{t-1})^2 [t=2, T])/(∑(u_t)^2 [t=2, T])

So, DW is the ratio of: sum of squared differences of consecutive residuals, divided by the sum of squared residuals. Squared normally distirubted variables appear to be chi squared, which should give something like F-distribuiton.

IMPORTANT: All of the residuals here are the estimates that we get from fitting a model, and computing the residuals.

Recall that the denominator is simply the variance of the residuals, because the mean of them is expected to be 0.
The enumerator basically keeps track of how much correlation there is.

DW is approximately equal to = 2(1-p), where p is the estimated correlation coefficient from the formula: u_t = pu_{t-1} + v_t

The DW is weird, because it doesnt follow a statistical distribution. Instead, it works with critical values.
Firstly, if we use that DW =2(1-p), we get that if p=1, DW = 0.
if p=0, DW=2
if p=-1, DW = 4

thus, the case of no correlation is in the middle range.

The values DW use as critical values are listed in the book’s appendix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Elaborate on Breusch-Godfrey test

A

The Breusch-Godfrey test is a more general test that test the autocorrelation up to k’th order (lag).

We pick an order, and use the following model as the baseline for the test:

u_t = p1 u_{t-1} + p2 u_{t-2} …. + pr u_{t-r} + v_t, vt is normally distributed with n(0, sigma^2).

The null hypothesis is that all of the correlations (the p’s) are 0. Thus, this test is trying to answer the question “Is there any autocorrelation among the first ‘r’ lags?”.

The alternative hypothesis is the sequence of logical OR’s of p_x != 0.

NOTE: When eprforming the test below, we add the X’s as well because this makes the test valid even if exogenity is not present. Thus, this test ONLY answers: Is there autocorrelation?
This means that we obviously need to check for correlation between residuals and explanatory variables as well later.

The test proceeds as follows:

Step 1)
perform regualr regression to obtain the residuals.

Step 2)
Perform a new type of regression, where the dependent variable is u_t, and the regressors include the original x’es from step 1 in addition to the regression line we discussed above, which consist of the correlations and the errors (lagged errors).

Then we obtain R^2 from this new modified regression.

Step 3)
If T is the number of observations, the test statistic given by:
(T-r)R^2 is chi-squared distributed with r degrees of freedom. Then can make a simple chi squared test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

when is DW test valid?

A

There must be a constant term in the regression.

The regressors must be non-stochastic.

there must be no lags of dependent variabel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what can we do to remove outliers

A

firstly, outliers is difficult concept because it breach the assumptoin of normallity distributed error terms very easily.

We can knock them out by adding dummy variables. A single dummy per outlier. We then name the dummy varaible something related to the outlier, so it is easy to keep track of it.

of course, doing this also potentially fucks up the data and estimation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

do we have alternative tests than the DW test?

A

Yes, the Breusch-Godfrey test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

why might lags be required in a regression?

A

2 primary reason:
1) Inertia. Soemtiems, it takes some time to react to certain events. time series are nice for this as it allows for correlation between such events.

2) Overreactions. Perhaps the market has a tendency to react very positively to good news.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what could happen if we disregard the fact that a model is actaully violating the assumption of autocorrelation

A

Much of the same as the heteroscedasticity.

The estiamtors will still be unbiased, but are no longer particualrily effcieint (they are not BLUE).

As a result, the standard error estiamtes are not necessarily correct, and we can get bad inferences in regards to whether a variable is important or not etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

how can we deal with autocorrelation issues?

A

If we know the form of the autocorrelation, we can use GLS.
GLS procedures. One such procedure is called “Cochrane-Orcutt” procedure.

We assume that the autocorrelation is produced by a specific process, typically an AR process. So we get something like this:

We have the regular regression as always:

y_t = b_1 + b_2 x_2t + b_3 x_3t + u_t

but we also have the assumption:

u_t = p u_{t-1} + v_t

We first obtain the residuals, ignoring the assumption.
Then we run the regression û_t = pû_{t-1} + v_t, to obtain an estimate ^p for p.

A lot of more bullshit, cant be arsed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

how do we test for violation of assumption 5??

25
Q

elaborate on orthogonality in regards to OLS

A

If the variables (independent variables) are actually independent, it means that there are no relationship between them, and they are orthogonal.

If orthogonality apply, adding and removing variables from the regression will not change any of the parameter values of the other explanatory variables.

26
Q

elaborate on orthogonality in linear regression in practice

A

Even thouhg we assume independence among the explanatory variables, in rpactice there is almost always a certain amount of correlation. This is usually fine, and does not result in too much loss of precision.

however, we need to control it, or at least make sure that the dependence among the so-called independent variables is not too large. If it is very large, undesirable shit can happen.

27
Q

what do we call it when explanatory variables are related to each other

A

multicollinearity

28
Q

elaborate on multicollinearity

A

we distinguish between 2 main types of multicollinearity:

1) Perfect multicollinearity
2) Near multicollinearity

Perfect multicollinearity
There is an exact relationship between two or more variables. In such a case, it is not possible to estimate all the coefficients in the model.
Perfect multicollinearity is typically a case of a modeling mistake where we are basically including the same variable twice etc.
For instance, if one variable is proportional to some other variable, including both in a regression model will make us attempt to estimate 2 parameters, but we only have information enough for one.
technically, the issue lies in inverting the X’X matrix, since there is issues regarding independence among columns etc.

Near multicollinearity
Near multicollinearity is more likely to occur, and would arise when there is non-negligible relationships between variables.
Obviously, relationships between the dependent and independent variables is not referring to multicollinearity.

29
Q

how we do test for multicollinearity?

A

Testing for it is surprisingly difficult. therefore, we only investigate the presence of it.

30
Q

how do we explore the multicollinearity in a regression model?

A

We take the explanatory variables, and create the correlation matrix.
High correlation is an indication of multicollinearity.

NOTE: This only manage to locate relationships among pairs of explanatory variables. Oblique cases are not covered.

31
Q

what is a more formal way of exploring multicollinearity, rather than using correlation matrix?

A

VIF: Variance Inflation Factors.
the idea is to represent how much more variance an explanatory variable coefficeint receive as a result of being correlated with some other explanatory variable.

The VIF has more information, but i cant be arsed atm

32
Q

elaborate on what happens if we observe that the model has high multicollinearity, but we ignore it

A

Individual coefficeints will have high standard errors, but the R^2 will look good. Therefore, the regression might look good, but the individual variables are not significant.

33
Q

Reminder to truly learn the topic of standard error, and how it relates to the coefficents etc

34
Q

how do we deal with the issue of multicollinearity?

A

One can attempt methods like PCA. However, such methods are complex, and might create models that are difficult to interpret and use.
As a result, one tends to consider multicollinearity as a data issue rather than a modeling issue.

35
Q

what do we do if we suspect the model may be using the wrong functional form?

A

We need to perform a test that can give us an idea of whether the relationships are linear or not.

Ramsey’s RESET test will do.

36
Q

elaborate on Ramsey’s RESET test

A

We do regular linear regression. Then we extend to including the powers of the fitted results.

This works because the power terms include a WIDE ass variety in cross-terms and powers of the various x_tt variables (since the entire regression is to the power). This allows us to capture basically a shit load of differnet orders of forms.

Then we perform a test where we are lookng for the null hypothesis that alpha_2 to alpha_p are all 0 at the same time. If this is the case, the model is inherently linear. Or the underlying process is most likely linear.

THen we find the R^2, and multiply by T. TR^2 test statistic follow a chi squared distribution as X^2 (p-1), where p was the maximum order of the hyptoehsis or whatever. Degrees of freedom is (p-1).
The reason for p-1 and not just p, is because we use p power terms, but actually only use p-1 power terms because the case where the power equals 1 is not included in the new model.

37
Q

Assume we have done Ramsay’s RESET test. We find that the model is significantly not linear. What should we do?

A

the first thing is that the RESET test only say that something is wrong, it says nothing on what might be wrong. therefore, we dont really have a direction as to what kind of non-linear methods we should pursue.

There is shit one can try. One can try differnet non linear methods. One can try log function to transform into a more linear data.

38
Q

In general, what happens if we omit an important variable form the regression? For instance, say we believe the true data generation process follow a certain linear relationship, but we have not seen the effect of an important variable, so we never add it. HWat happens?

A

the estiamtes of the parameters for the variables we have included will be biased and inconsistent, unless the omitted variable is completely uncorrelated with the selected variables.

even if the omitted variable is completely uncorrelated, the constant intercept term will be biased as a result of it.

The result of this is basically just that the model will make inferences that are not completely true. It may be close though. however, if our goal is to perhaps understand and learn about the process, we’re obviously not learning the whole truth.

39
Q

Elaborate on what happens if we include trash variables that are not actually relevant?

A

The estimators of all the variables remain consistent and unbiased.
However, they will inflate their standard errors. In other words, the estimators become inefficient, meaning that their variance increase. Variance increase is that same as seeing their standard error increase.

This is a problem because with larger standard errors, it makes it more difficult to detect significancies in testing oh hypotheses etc.
For instance, variables that might have been considered marginally significant (meaning that they do contribute to predicting the dependent variable) may be brushed by as insignificant as a result of the shit standard error. If this happens, we will likely remove the variable, even though it actually has a contribution.

The book suggests that it is better to include marginally significant variables rather than risk losing important ones. The omitted variable bias is usually a more dangerous issue.

40
Q

what is parameter stability tests all about?

A

given a fitted model, we now want to see whether it is equally appropriate to use it on all areas of the domain (across the entire sample).

the overall idea is to split the sample of data into subsamples. then we estimate up to 3 models.
We create one model for all the data.
We create a model per sub sample. If we just divide the og sample into 2 pieces, we get 3 models total.

Then we compare RSS for all the models.

41
Q

what options do we consider for stability testing?

A

We have Chow (analysis of variance),

and predictive failure tests.

42
Q

elaborate on the chow test

A

it is actually basic F-test, but with different terms in the enumerator and denominator. Also differnet degrees of freedom.

But the overall idea is to figure out if the RSS of the subsamples is different from the RSS of the whole sample. For stability, we want it to be close together.

The H0 hypothesis is that the parameters for the two new models are the same.

43
Q

potential issue for the chow test

A

Require a lot of data. We need enough data to confidently make 2 regressions.

this is amplified if we want to test uneven cases. For instance, what if we want to place the break point close to the beginning or close to end of the dataset. Then we hardly have enough data for regression.

44
Q

What is an altenrative to the chow test

A

Predictive failure test.

45
Q

elaborate on the predictive failure test

A

It is more robust than the chow test, because it only requires us to fit a regression model to the entire original sample, and not one of the new subsamples. Therefore, we can select the larger one, and avoid the issue of having to fit a regression based on very little data.

We run the regression for the whole period, and find the RSS.

Then we run the regression for a large sample, and find that RSS.

Then we enter it into the F-statistic.

46
Q

When considering the predictive failure stability test, how do we determine where to place our break point?

A

We can plot hte shit and look for structural changes.

Split according to known historic events

47
Q

recall the main topics of this chapter

A

Explore the assumptions of the OLS and CRLM.

we want to identify whether some of the assumptions are violated. This because of how the model will behave if violated.

Then we want to understand what happens if violated.

Then we want to find some remedy

48
Q

There are commonalities in regards to what we can expect will happen if some of the assumptions are violated. Name em

A

1) Estimates of coefficients are simply not correct. Can for instance lack something, or be overfitted etc.

2) The standard errors corresponding to the coefficients can be wrong.

3) The hypothesis tests we use rely on certain distribuution, which might be wrong.

49
Q

another term for diagnostic test

A

Misspecification test.

50
Q

the chapter mention LM and Wald. Why

A

it is more of introducing the fact that we are using chi squared and F-distributions for most of our testing.

The chapter start by saying that these two are different for smaller samples, but are converging asymptotically.

51
Q

do we need the assumption E[u_t] =0?

A

Yes, but no.

tghere is a stronger one: E[u_t | x] = 0 that is used by the lectures. It is slightly differnet, and can replace this first assumption and the covariance one cov(x_j, u_j) = 0 I believe

52
Q

what can we say about plots and heteroscedasticity?

A

Plot is likely to reveal very little. We need testing.

53
Q

When thinking about testing against heteroscedasticity, what should pop into our head?

A

Goldfeld-Quandt.

F-statistic.
Split the sample into two parts. Length T1 and T2.

Compute the residual variances according to the formulas by the book. Again: residual variance. Not variance of parameter. we find the residuals from each fitted regression model, and find their variances.

these two residual variances will have differnet degrees of freedom., unless if they are identically sized up.

THne we divide them on each other to create a ratio of chi squared variables which is following the F-dist.

The test H0 is just that hte residual variances are the same.
The F-statistic is F-distributed with F(T1-k, T2-k) degrees of freedom.

54
Q

What else can we do to test for heteroscedasticity? Other than the goldfeld-quandt test

A

White’s test.

Benefit of White is that it doesnt make any assumption on the form of the heteroscedasitcity. Recall that the goldfeld-quandt test makes an assumption that first-half vs second-half is a good split choice. It can of course decide to split differently, but in either case we must make an assumption of what makes a good splitting point.
This is not an issue with White.

55
Q

elaborate on White’s test

A

Test for heteroscedasticity (Var(u_t)=sigma^2 for all t)

we obtain residuals from our original model; then we use these residuals (squared) to fit a new model (which is the auxiliary regression) which have the residuals as the dependent variable; therefore we try to explain fluctuations around the mean of the residuals by linear and some non-linear relationships between the residuals and the explanatory variables. If we find that fluctuations around the mean squared residual is explained by one or more of these relationships (indicated by non-zero coefficient) then we know that the variance of the residuals are not completely random and constant.

Regarding the auxilliary regression, the terms are decided based on broad understanding of capturing the basic and most cases. We include all the linear terms, their squared non-linear terms, and their interaction terms. We could add more terms, for instance exponential, but this will also bring potential of overfitting the model, which will increase the likelihood that we end up seeing things in the variance that actually doesnt exist.

Note the importance of the constant term as well in the auxilliary regression. Extremely important.

To perform the test, we run the auxiliary regression. then we run it again, but only on a single constant term. This is the crux. we are checking whether we can actually explain fluctuations in variance of the residuals as a result of our explanatory variables, and this to a higher degree than the single constant is able to do. Our wish is that this is not the case, and that the single-constant regression is able to capture the exact same shit as the auxiliary regression does.

We find the RSS from both models, and use them in F-statistic.
ALTERNATIVELY; we use the LM approach. In such a case, we use the fact that the R^2 will be large if one or more of the coefficeints are statistically significant (of the auxiliary regression. this approach use only the auxiliary regression, not the one with only the constant). If none of the coefficeints are significatn, R^2 will be relatively low. We’d obtain R^2, multiply by the sample size T, and use the fact that this is a chi-squared distributed variable with m (m=number of regressors) degrees of freedom.

The null hypothesis is that all coefficients in the auxiliary regression, except for the constant, has value 0. We wish to reject it.

56
Q

What should we do for autocorrelation between residuals?

A

Breusch-Godfrey is the better option. Recall that DW is also a possibility, but only checks for a single lag.

Breusch-godfrey creates a new regression based on the residuals (residuals are the dependent variable), and it use ‘r’ lags with each corresponding coefficient.
We also include the explanatory variables in the new regression. The aim is to explore whether any of the coefficients are significant. We want them to be close to zero.

The crux is that we are testing whether it is possible to explain the fluctuations of the residuals around its mean by modeling using the explanatory variables and the lagged residuals at the same time in a linear way.

Finally, we obtain R^2 and use the fact that it is chi-squared with r degrees of freedom if we multiply it by (T-r), where T is the sample size and r is the number of lags we’re looking at.

57
Q

one of the assumptions is the Xt is non-stochastic. The book actually say that it is fine that Xt is stochastic given that the covariance between xt and ut is 0. Why?

A

If a variable is correlated with error term, the model will not even be consistent. The reason is that when the residual and some variable swing together, if the error term is large and therefore also the explanatory correlated variable is large, the model will credit the variable with the high value for y, when in reality it was only the error that was high.

58
Q

what do we do to test for normality of the residuals?

A

BJ test. Chi squared shit. follow the book, the shit is much formula.

59
Q

what do we do if normality of the residuals is violated?

A

If the sampels are large enough, normality is sort of guaranteed, so then nothing really happens. Due to how chi squared variables tend toward normality for large samples size.

60
Q

in finacne and econometrics, what is typically something that can fuck up the normality assumption?

A

Outliers. We get some outliers that makes it look like the data is not normally distirbuted (residuals I mean).

We can remove the outliers, in the sense of residuals, by including one dummy variable per outlier, and simply assign a 1-value to the point. Thus we can achieve perfect overfit and remove the point when considering the residuals. This obviously is at the risk of losing potentially important data though.

61
Q

elaborate on VIF

A

The aim of VIF is to provide a measure that we use to see whether there is multicollineatrity in the data or not.

The way it works is that we are checking whether the variance of a parameter estimate increases as a result of correlation among variables.

We make a regression where x_i, which is one of the original explanatory variables, is the dependent variable. We make the regression using constant term (intercept) and all the other explanatory variables. Then we find the R^2 that results from this new auxiliary regression. This will capture the degree to which the other explanatory variables are able to capture the fluctuations of x_i around its mean. If there is correlation, we will expect R^2 to be quite large, as a large R^2 represent a large degree of explained variance in the model. We could use R^2 in itself, but we subtract it from 1 to get a result that basically is the ratio of how much variance that is not explained, and then use this as the lower part of a fraction to obtain a multiplier that represent how much more variance the parameter of x_i has compared to what it would have been if all variables (explanatory) were independent.
If R^2 is very large, it means that the explanatory variables are good predictors of the x_i explanatory variable. this is a violation of our assumptions of zero covariance among explanatory variables. If R^2 is large, (1-R^2) is small, and this makes 1/(1-R^2) large. and vice versa.

RULE OF THUMB: If the VIF is below 5, we typically neglect multicollineairty.

62
Q

what can we do if we observe high multicollinearity?

A

Advanced: PCA
Not advanced: ignore it, drop correlated variable, transform corrleated variables into a ratio

63
Q

what do we mean by “wrong functional form”?

A

Wrong functional form refers to assuming that the true form is for instance “linear” while it has non-linear shape of type polynomial etc.

This is naturally something we want to test for

64
Q

elaborate on the Ramsey’s RESET test

A

Perform regular regression. Then we use the predicted y-values in a new regression where we still use the very same y-variable as dependent variable, but now we include more terms. The terms we now include are higher powers of y, using the results from our previous predicitons. The ultimate goal is then to see if some of the new coefficients for the higher powered y-prediction-variables are signiificant or not. If significant, we typicallly have a case of non-linearity.

65
Q

what is parameter stability testing about?

A

Figuring out whether the parameters are stable throughout the entire sample, or if some subsets are significantly different from others.

66
Q

the book mention an alternative way to perform both the Chow test and predictive failure tests. What and elaborate

A

dummy variable approach.

for instance, if we split a subset of size T into T1 and T2, we can use dummy variables to identify which sample belongs to which part.

The book list it like the image shows.

67
Q

elaborate on the chow test using dummy variable approach

A

The goal is to understand whether the parameters of the model are the same for two subsets.

We use dummy variables to differ between subsets.

Then we use a simple F-test to see whether the parameters of the dummy section is 0 or not.

The dummy variables only kicks in when the subset if activated, and provides a possible “correction”. If no correction is made, the first subset and the second subset are similar and the parameters are stable.