Reading 2: Multiple Regression Flashcards
Adjusted R2
1 - ((n-1)/(n-k-1)*(1-R2))
- R2 increases as variables are added to the model. Helps with “overestimation of the regression”
- R2a will always be less than or equal to R2
Dummy Variables
- binary (either on or off)
- n class = n-1 dummy variables
- Occurs when the variance of the residuals is not constant across all observations
- Unconditional: not related to level of independent variables (causes no major problems)
- Conditional: related to level of independent variables and causes problems.
Effects of heteroskedasticity on regression analysis
1) Standard errors are unreliable
2) Coefficients are unaffected
3) t-stats will be too big or too small
4) F-test is unreliable
Detecting Heteroskedasticity
- examine the scatter plots of the residuals
- Breusch-Pagan chi-square test: n * (R2 from a second regression from the squared residuals of the first regression on independent variables)
- *one-tailed test because heteroskedasticity is only a problem if the R2 and BP test statistic are too large.
Correcting Heteroskedasticity
Option 1: Calculate robust standard errors (White-corrected standard errors)
Option 2: Generalized least squares: eliminates heteroskedasticity by modifying the original equation
Serial Correlation (autocorrelation)
-residual terms are correlated with one another
Positive: positive regression in one time period increases the probability of observing a positive regression error for the next time period.
Negative: negative regression in one time period increases the probability of observing a negative regression error for the next time period.
Effect of Serial Correlation on Regression Analysis
- Results in standard errors that are too small
- Small standard errors will cause computed t-stats to be larger than they should be, which will cause too many Type I errors (rejection of null when it is actually true)
- F-test will also be unreliable because the MSE will be underestimated leading to too many Type I errors
Detecting Serial Correlation
-Residual plots
-Durbin-Watson statistic: 2(1-r)
-r = correlation coefficient b/w residuals from one period and those from the
previous period
- DW = 2 (homoskedastic and not serially correlated, r = 0)
- DW < 2 (positively serially correlated)
- DW > 2 (negatively serially correlated)
Durbin Watson decision rule
Ho: Regression has no positive serial correlation
There are upper and lower critical DW-values:
- If DW < d1; the error terms are positively serially correlated (reject null)
- If dl < DW < du, test is inconclusive
- If DW > du, there is no evidence that the error terms are positively correlated.
Correcting Serial Correlation
- Adjust the coefficient standard errors: Hansen method
- Hansen method also correct for conditional heteroskedasticity (use if both are the
- Hansen method also correct for conditional heteroskedasticity (use if both are the
- Improve specification of the model
-refers to the condition when two or more independent variables or linear combinations of independent variables are highly correlated with each other
Effects of multicollinearity on regression analysis
- unreliable coefficients
- standard errors are artificially inflated
- Greater probability of Type II error
Detecting Multicollinearity
- t-tests are not significantly different from zero, while F-test is significant and R2 is high
- .7 is typically the level of correlation where multicollinearity is an issue
Levels of Misspecification
1) Functional form can be misspecified
- important variables are omitted
- variables should be transformed
- data is improperly pooled: wrong time period chosen
2) Explanatory variables are correlated with the error term in time series models
- a lagged dependent variable is used as independent variable
- a function of the dependent variable is used as an independent variable (“forecasting the past”)
- Independent variables are measured with error
3) Other time-series misspecifications that result in nonstationary
Unbiased estimator
-expected value of the estimator is equal to the parameter you are trying to estimate.
Consistent estimator
- accuracy of the parameter estimate increases as the sample size increases.
- as sample size approaches infinity, standard error approaches zero