week 3 & 4 Flashcards
RSE
Residual Standard Error
= √ 1/ n − 2 RSS
R-squared
R2 =TSS − RSS/ TSS
= 1 −RSS/ TSS
TSS
total sum of squares.
= Σni=1 (yi − y ̄)^2
βj
average
Xj
holding all other predictors fixed.
Interpreting regression coefficients
- The ideal scenario is when the predictors are uncorrelated — a balanced design
- Correlations amongst predictors cause problems
- Claims of causality should be avoided for observational data.
Estimation and Prediction for Multiple Regression
• Given estimates βˆ0, βˆ1, . . . βˆp, we can make predictions using the formula
• We estimate β0, β1, . . . , βp as the values that minimize the sum of squared residuals
This is done using standard statistical software. The values βˆ0, βˆ1, . . . , βˆp that minimize RSS are the multiple least squares regression coefficient estimates.
F-statistic
F = [(TSS − RSS)/p] / [RSS/(n − p − 1)] ∼ Fp,n−p−1
Forward selection
• Begin with the null model — a model that contains an
intercept but no predictors.
• Fit p simple linear regressions and add to the null model the variable that results in the lowest RSS.
• Add to that model the variable that results in the lowest RSS amongst all two-variable models.
• Continue until some stopping rule is satisfied, for example when all remaining variables have a p-value above some threshold.
Backward selection
- Start with all variables in the model.
- Remove the variable with the largest p-value — that is, the variable that is the least statistically significant.
- The new (p − 1)-variable model is fit, and the variable with the largest p-value is removed.
- Continue until a stopping rule is reached. For instance, we may stop when all remaining variables have a significant p-value defined by some significance threshold.
qualitative,
These are also called categorical predictors or factor
variables.
Extensions of the Linear Model
Removing the additive assumption: interactions and
nonlinearity
hierarchy principle:
If we include an interaction in a model, we should also
include the main effects, even if the p-values associated with their coefficients are not significant.
alternatives to least squares
- Prediction Accuracy: especially when p > n, to control the variance.
- Model Interpretability: By removing irrelevant features — that is, by setting the corresponding coefficient estimates to zero
Subset Selection.
We identify a subset of the p predictors that we believe to be related to the response. We then fit a model using least squares on the reduced set of variables.