Multicollinearity Flashcards
Perfect Multicollienearity
R^2 = 1 perfect linear relationship between the explanatory variables.
What happens to the estimators when multicollinearity exists?
Cannot identify unique estimates for parameters and therefore cannot draw any statistical inferences (ie hypothesis testing) about a sample.
Near, Imperfect Multicollinearity
Two or more explanatory variables aren’t exactly linearly related.
Inferior Good
Income increases but demand for the good declines
r
Coefficient of Correlation; used to determine the strength of or degree of collinearity. May not be adequate if multiple variables are involved.
Ordinary Least Squares (OLS)
OLS produces estimators with the smallest variances; they are BLUE: Best, linear, unbiased, estimate. OLS remain BLUE even when one of the partial regression coefficients are statistically insignificant.
Unbiasedness
the estimator provides you with the correct parameter coefficient; converges, because it is a repeated sampling property, with the true population value of the estimates
Consequences of Multicollinearity
- OLS doesn’t destroy minimum variance property; however the numerical value of the variance does not have to be small.
- Large variances and standard error, wider confidence intervals
- Small t-value; insignificant t-ratios
- Failure to reject the null, resulting in a type 2 error
- Cannot estimate Xs influence on Y
- High R^2 but few statistically significant t ratios
- OLS estimators and their standard errors become more sensitive to small changes in the data.
- Wrong signs for regression coefficients
- difficulty in assessing the individual contributions of explanatory variables to the explained sum of squares or R^2. –> because the variables are so collinear when one moves the other does also, making it impossible to separate them.
Why kind of problem is Multicollinearity?
It is a sampling (regression) phenomenon; some samples can have Xs that are so collinear, it messes up the regression analysis. Xs might not be linear in the population. This problem occurs because data is typically nonexperimental; observed as they occur.
t critical vs t value
t critical are the values that determine the critical region under the normal distribution, outside of which we would reject the null hypothesis at a selected level of confidence. The t value is the value we calculate using the standard error.
Sample Specific
- matter of degree, and not of presence or absence of multicollinearity/
- condition of the X variables that are non-stochastic; feature of the sample and not the population
Indicators of Multicollinearity
- High R^2 but few significant t ratios
- High pairwise correlation among explanatory variables. >.8, there is possibility of multicollinearity
- Examination of partial correlations
- Subsidiary or auxiliary regressions
- Variance Inflation Factor
Partial correlation coefficient
correlation btw two variables, holding the influence of the other x variables constant.
Auxiliary Regression
regressing each x variable on the remaining X’s to compute the corresponding R^2; these regressions are considered “subsidiary or auxiliary to the main regression. want to find coefficient of determination; then determine if it the R^2 for each is statistically significant using F test. – high R^2 can be a surface indicator
Variance Inflation Factor
VIF = 1/ (1- R^2); as R^2 increases the variance and standard error increase or inflated. Undefined if perfect collinearity (R^2 =1) and 1 perfect collinearity.