Linear Regression Flashcards
Potential Problem with Linear Regression
- Non Linearity 2.Correlation of Error terms 3.Non Constant variance 4.Outliers, 5.High-Leverage 6. Colinearity
Residual Plots in Linear Regression
(1) if there is a shape of the residual that is non-linear, suggests a non linear relationship in the data. (2) Also, can you residual plots to detect hetroscedasticity in the dataset, which will look like a funnel (3) Also look for tracking in the residuals, which may occur if error terms are correlated - think time series data
2 assumptions of linear model
(1) Additive and (2) Linear
Outliers in Linear Regression
Can skew the model, may want to remove these. Also there is a statistic called stdentized residual, for which a value greater than 3 suggests possible outliers.
Levarge in Linear Regression
When there is an extreme X value, can compute a leverage statistic.
Will Correlation Matrix Detect Colinearity
Yes, but not all cases. It is possible for combinations of vars to have colinearity. There is a statistic called the Variance Inflation Factor VIF. VIF > 5 indicates colinearity.
Variance Inflation Factor
VIF > 5 indicates colinearity
Studentized Residual
helps detect outliers in Y, value > 3 suggests outliers.
Leverage Statistic
If exceeds (p + 1) / n then the corresponding point has high leverage.
Possible Solutions for colinearity
(1) drop one of the vars (2) combine the two, like take the average
Bias vs Variance Tradeoff
Variance - tendency to overfit. Bias - Accuracy of Model
Parametric vs. Non Paramteric Model Accuracy
Parametric will tend to outpeform NP when there is small N/P because of high dimensionality. Think about what high dimensionality does to KNN
When order doesn’t matter permutations or combinations? Formula for each
Order doesnt' matter = combination Combination: P choose K = P! / K! (P-K)! Permutation: P choose K = P! / (P-K)!
Confusion Matrix
Y axis (left): Predicted Status X axis (top): Actual or True Status
Sensitivity vs Specificity
Sensitivity: % of true positives caught.
Specificty: % of non-positives caught.