Data Analysis IIb: Advanced Regression Topics (Week 7) Flashcards
What is multicollinearity?
One assumption of Ordinary Least Squares is that the IVs are INDEPENDENT of each other
- > Assumption is violated: Multicollinearity
- Multicollinearity has nothing to do with DV, only about IVs being highly related to one another
How do we detect multicollinearity?
From correlation matrix
Very high correlations indicate multicollinearity
Cutoff arbitrary: e.g. 0.9 (sign does not matter)
Limitation: Only BIVARIATE relationships
Apart from the correlation matrix, what is a better approach to detect multicollinearity?
Variance Inflation Factors (VIFs)
- Looks at multiple IVs at once (instead of only pairs)
How do we obtain VIFs?
Estimate regression models by using IVs as DVs
Example:
X1i = βo + β1 X2i + β2 X3i + βk Xk+1i + εi
X2i = βo + β1 X1i + β2 X3i + βk Xk+1i + εi
How do we determine the VIF of models?
VIF = 1 / (1-R^2)
Cutoff arbitrary: e.g. 10
What does having a high VIF score mean?
The effect of that variable is captured by other variable(s)
What do we do when VIF>10?
If VIF >10, need to drop some variables from model b/c they’re highly correlated with each other.
Leave out the variable with the highest VIF score OR Combine variables in one variable (use factor analysis)
What are dummy variables?
Represent categorical variables in data analysis
What are the advantages of using dummy variables in data analysis?
Use age as continuous variable in reg. model
– a linear relationship is assumed.
V restrictive assumption that increments are constant/=
Dummy variables overcome this
What are moderators?
Variables that affect the r/s b/w IV and DV
Do not directly affect the DV
How do we estimate moderation?
Include complaints * apology (i.e. interaction term) in the reg. equation, with main effects of complaints and apology