3. MLR Flashcards
Purpose of Multiple Linear Regression
Explain the variation of Y.
( Each coefficient is interpreted as the estimated change in y corresponding to a unit change of its related variable, with all other variables held constant. )
Assumptions for error terms
- Normally distributed with mean = 0
( else not efficient estimators ) - Constant Variance
( fix heteroscedacity with weighted least squares ) - Error terms are independent of each other
Common issues of MLR
- Overfitting ( non-contributing variables )
- Multicollinearity ( redundant variables )
Testing for Overfitting
Single linear regression for each variable. Weak correlations signal non-contributing variable.
Testing for Multicollinearity
( After checking significance of each variable )
- High correlation between explanatory variables
- Sensitivities when including explanatory variables
- Increase in Standard error, sharp decrease of adjusted R^2
- Variance Inflation Factor (VIF) > 10
( VIF = 1/(1-R^2) )
Testing significance
ANOVA - Analysis of Variance / F-Test ( MODEL )
p < alpha means variable explains some variation of y with stat. significance
- t-Test ( INDIVIDUAL VARIABLES )
Goodness of fit test measures
- R^2
- Adjusted R^2
- MSE
- F-TEST (VANOVA)
Formula for F-Test
MSR / MSE or (SSR/k) / (SSE/n-k-1)
Formula for R^2
R^2 = 1 - (SSE/SST)