2.1 Why Multiple Regression Isn't As Easy As It Looks Flashcards
What is heteroskedasticity?
Define unconditional and conditional heteroskedasticity.
Recall that one of the assumptions of multiple regression is that the variance of the residuals is constant across observations.
Heteroskedasticity occurs when the variance of the residuals is not the same across all observations in the sample.
- unconditional heteroskedasticity: occurs when the heteroskedasticity is not related to the level of the independent variable, which means that it doesn’t systematically increase or decrease with changes in the value of the independent variable(s). While this is a violation of the equal variance assumption, it usually causes NO MAJOR PROBLEMS.
- conditional heteroskedasticity: is heteroskedasticity that is related to the level of (I.e., conditional on) the independent variables. For example, conditional heteroskedasticity exists if the variance of the residual term increases as the value of the independent variable increases. Conditional heteroskedasticity does CREATE SIGNIFICANT PROBLEMS.
What is the effect of heteroskedasticity on regression analysis?
There are four effects of heteroskedasticity you need to be aware of:
- The standard errors are usually unreliable estimates
- The coefficient estimates (the bj) aren’t affect
- If the standard errors are too small, but the coefficient estimates themselves are not affected, the t-statistics will be too large and the null hypothesis of no statistical significance is rejected too often. The opposite will be true if the standard errors are too large
- The F-test is also unrealiable
How do we detect heteroskedasticity?
There are two methods to detect heteroskedasticity:
- Examining scatter plots is the residuals and
- Breusch-Pagan chi-square test:
The test calls for the regression of the squared residuals on the independent variables. If conditional heteroskedasticity is present, the independent variables will significantly contribute to the explanation of the squared residuals:
BP chi-squared test = n * R^2 from a second regression of the squared residuals
with k degrees of freedom
ONE TAILED TEST because heteroskedasticity is only a problem if the R^2 and the BP test statistic are too large.
How do we correct for heteroskedasticity?
The most common remedy and the one recommended in the CFA curriculum is to calculate ‘robust standard errors’, also called White-corrected standard errors.
Nas questões necessárias será informado o standard error corrigido por White.
What is serial correlation or autocorrelation? Explain positive and negative serial correlation?
Serial correlation or autocorrelation refers to the situation in which the residual terms are correlated with another. Serial correlation is a relatively common problem with time series data.
- Positive Serial Correlation: when a positive regression error in one time period increases the probability of observing a positive regression error for the next period.
- Negative Serial Correlation: when a positive error in one period increases the probability of observing a negative error in the next period.
What is the effect of serial correlation on regression analysis?
Because of the tendency of the data to cluster together from observation to observation, positive serial correlation typically results in coefficient standard erros that are too small, even though the estimated coefficients are consistent. These small standard error terms will cause the computed t-statistics to be larger than they should be, which will cause too many Type 1 errors: rejection of the null hypothesis when it is actually true. The F-test will also be unreliable because the MSE will be underestimated leading again to too many Type 1 errors.
- Positive serial correlation is much more common in economic and financial data, so we focus our attention on its effects. Additionally, serial correlation in a time series regression may make parameter estimates inconsistent.
How do we detect serial correlation?
There are two methods:
- Residual plots
- Durbin-Watson (DW) statistic
If the sample size is very large, DW ~ 2(1-r)
r = correlation coefficient between residuals from one period and those from the previous period
So, DW test is approximately equal to 2 if the error terms are homoskedastic and not serially correlated (r=0).
DW < 2 if the error terms are positively serially correlated (r>0)
DW > 2 if the error terms are negatively serially correlated (r<0)
• But how much below the magic number 2 is statistically significant enough to reject the null hypothesis of no positive serial correlation?
There are tables of DW statistics that provide upper and lower critical DW-values
. If DWupper critical value: there is no evidence that the error terms are positively correlated
How do we correct for serial correlation?
Two ways:
- The Hansen method. Não explica.
- Improve the specification of the model
What is multicolinearity?
Multicolinearity refers to the condition when two or more of the independent variables, or linear combinations of the independent variables, in a multiple regression are highly correlated with each other. This condition distorts the standard error of estimate and the coefficient standard errors, leading to problems when conducting t-tests for statistical significance of parameters.
What is the effect of multicolinearity on regression analysis?
Even though multicolinearity does not affect the consistency of slope coefficients, such coefficients themselves tend to be unreliable.
Additionally, the standard errors of the slope coefficients are artificially inflated. Hence, there is a greater probability that we will incorrectly conclude that a variable is not statistically significant (type 2 error).
How do we detect multicolinearity?
The most common way to detect multicolinearity is the situation where t-tests indicate that none of the individual coefficients is significantly different than zero, while the F-test is statistically significant and the R^2 is high.
How do we correct for multicolinearity?
The most common method to correct for multicollinearity is to omit one or more of the correlated independent variables. Unfortunately, it is .not always an easy task to identify the variable(s) that are the source of the multicollinearity. There are statistical procedures that may help in this effort, like stepwise regression, which systematically remove variables from the regression until multicollinearity is minimized.
Explain Breusch-Pagan test.
• Breusch-Pagan chi-square test:
The Breusch-Pagan test is a way to test heteroskedasticity.
The test calls for the regression of the squared residuals on the independent variables. If conditional heteroskedasticity is present, the independent variables will significantly contribute to the explanation of the squared residuals:
BP chi-squared test = n * R^2 from a second regression of the squared residuals
with k degrees of freedom
ONE TAILED TEST because heteroskedasticity is only a problem if the R^2 and the BP test statistic are too large.