week 3 & 4 Flashcards

Question 1

Q

RSE

Answer

A

Residual Standard Error

= √ 1/ n − 2 RSS

Question 2

Q

R-squared

Answer

A

R2 =TSS − RSS/ TSS

= 1 −RSS/ TSS

Question 3

Q

TSS

Answer

A

total sum of squares.

= Σni=1 (yi − y ̄)^2

Question 4

Q

βj

Question 5

Q

Xj

Answer

A

holding all other predictors fixed.

Question 6

Q

Interpreting regression coefficients

Answer

A

The ideal scenario is when the predictors are uncorrelated — a balanced design
Correlations amongst predictors cause problems
Claims of causality should be avoided for observational data.

Question 7

Q

Estimation and Prediction for Multiple Regression

Answer

A

• Given estimates βˆ0, βˆ1, . . . βˆp, we can make predictions using the formula
• We estimate β0, β1, . . . , βp as the values that minimize the sum of squared residuals
This is done using standard statistical software. The values βˆ0, βˆ1, . . . , βˆp that minimize RSS are the multiple least squares regression coefficient estimates.

Question 8

Q

F-statistic

Answer

A

F = [(TSS − RSS)/p] / [RSS/(n − p − 1)] ∼ Fp,n−p−1

Question 9

Q

Forward selection

Answer

A

• Begin with the null model — a model that contains an
intercept but no predictors.
• Fit p simple linear regressions and add to the null model the variable that results in the lowest RSS.
• Add to that model the variable that results in the lowest RSS amongst all two-variable models.
• Continue until some stopping rule is satisfied, for example when all remaining variables have a p-value above some threshold.

Question 10

Q

Backward selection

Answer

A

Start with all variables in the model.
Remove the variable with the largest p-value — that is, the variable that is the least statistically significant.
The new (p − 1)-variable model is fit, and the variable with the largest p-value is removed.
Continue until a stopping rule is reached. For instance, we may stop when all remaining variables have a significant p-value defined by some significance threshold.

Question 11

Q

qualitative,

Answer

A

These are also called categorical predictors or factor

variables.

Question 12

Q

Extensions of the Linear Model

Answer

A

Removing the additive assumption: interactions and

nonlinearity

Question 13

Q

hierarchy principle:

Answer

A

If we include an interaction in a model, we should also

include the main effects, even if the p-values associated with their coefficients are not significant.

Question 14

Q

alternatives to least squares

Answer

A

Prediction Accuracy: especially when p > n, to control the variance.
Model Interpretability: By removing irrelevant features — that is, by setting the corresponding coefficient estimates to zero

Question 15

Q

Subset Selection.

Answer

A

We identify a subset of the p predictors that we believe to be related to the response. We then fit a model using least squares on the reduced set of variables.

Question 16

Q

Shrinkage

Answer

Study These Flashcards

A

We fit a model involving all p predictors, but the estimated coefficients are shrunken towards zero
relative to the least squares estimates. This shrinkage (also known as regularization) has the effect of reducing variance
and can also perform variable selection.

Question 17

Q

Dimension Reduction.

Answer

Study These Flashcards

A

We project the p predictors into a M-dimensional subspace, where M < p. This is achieved by
computing M different linear combinations, or projections, of the variables. Then these M projections are used as predictors to fit a linear regression model by least squares.

Question 18

Q

Best subset and stepwise model selection procedures

Answer

Study These Flashcards

A

Let M0 denote the null model, which contains no
predictors. This model simply predicts the sample mean for each observation.
For k = 1, 2, . . . p:
(a) Fit all p k models that contain exactly k predictors.
(b) Pick the best among these p k models, and call it Mk. Here best is defined as having the smallest RSS, or equivalently largest R2

Question 19

Q

deviance

Answer

Study These Flashcards

A

negative two times the maximized log-likelihood— plays the role of RSS for a broader class of models.

Question 20

Q

Forward Stepwise Selection

Answer

Study These Flashcards

A

Let M0 denote the null model, which contains no
predictors.
For k = 0, . . . , p − 1:
2.1 Consider all p − k models that augment the predictors in Mk with one additional predictor.
2.2 Choose the best among these p − k models, and call it Mk+1. Here best is defined as having smallest RSS or highest R2

Question 21

Q

Backward Stepwise Selection

Answer

Study These Flashcards

A

Let Mp denote the full model, which contains all p
predictors.
For k = p, p − 1, . . . , 1:
2.1 Consider all k models that contain all but one of the
predictors in Mk, for a total of k − 1 predictors.
2.2 Choose the best among these k models, and call it Mk−1. Here best is defined as having smallest RSS or highest R2

Question 22

Q

Estimating test error: two approaches

Answer

Study These Flashcards

A

We can indirectly estimate test error by making an
adjustment to the training error to account for the bias due to overfitting.
We can directly estimate the test error, using either a
validation set approach or a cross-validation approach, as discussed in previous lectures.

week 3 & 4 Flashcards

(22 cards)