week 3 & 4 Flashcards

1
Q

RSE

A

Residual Standard Error

= √ 1/ n − 2 RSS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

R-squared

A

R2 =TSS − RSS/ TSS

= 1 −RSS/ TSS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

TSS

A

total sum of squares.

= Σni=1 (yi − y ̄)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

βj

A

average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Xj

A

holding all other predictors fixed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Interpreting regression coefficients

A
  • The ideal scenario is when the predictors are uncorrelated — a balanced design
  • Correlations amongst predictors cause problems
  • Claims of causality should be avoided for observational data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Estimation and Prediction for Multiple Regression

A

• Given estimates βˆ0, βˆ1, . . . βˆp, we can make predictions using the formula
• We estimate β0, β1, . . . , βp as the values that minimize the sum of squared residuals
This is done using standard statistical software. The values βˆ0, βˆ1, . . . , βˆp that minimize RSS are the multiple least squares regression coefficient estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

F-statistic

A

F = [(TSS − RSS)/p] / [RSS/(n − p − 1)] ∼ Fp,n−p−1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Forward selection

A

• Begin with the null model — a model that contains an
intercept but no predictors.
• Fit p simple linear regressions and add to the null model the variable that results in the lowest RSS.
• Add to that model the variable that results in the lowest RSS amongst all two-variable models.
• Continue until some stopping rule is satisfied, for example when all remaining variables have a p-value above some threshold.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Backward selection

A
  • Start with all variables in the model.
  • Remove the variable with the largest p-value — that is, the variable that is the least statistically significant.
  • The new (p − 1)-variable model is fit, and the variable with the largest p-value is removed.
  • Continue until a stopping rule is reached. For instance, we may stop when all remaining variables have a significant p-value defined by some significance threshold.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

qualitative,

A

These are also called categorical predictors or factor

variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Extensions of the Linear Model

A

Removing the additive assumption: interactions and

nonlinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

hierarchy principle:

A

If we include an interaction in a model, we should also

include the main effects, even if the p-values associated with their coefficients are not significant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

alternatives to least squares

A
  • Prediction Accuracy: especially when p > n, to control the variance.
  • Model Interpretability: By removing irrelevant features — that is, by setting the corresponding coefficient estimates to zero
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Subset Selection.

A

We identify a subset of the p predictors that we believe to be related to the response. We then fit a model using least squares on the reduced set of variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Shrinkage

A

We fit a model involving all p predictors, but the estimated coefficients are shrunken towards zero
relative to the least squares estimates. This shrinkage (also known as regularization) has the effect of reducing variance
and can also perform variable selection.

17
Q

Dimension Reduction.

A

We project the p predictors into a M-dimensional subspace, where M < p. This is achieved by
computing M different linear combinations, or projections, of the variables. Then these M projections are used as predictors to fit a linear regression model by least squares.

18
Q

Best subset and stepwise model selection procedures

A
  1. Let M0 denote the null model, which contains no
    predictors. This model simply predicts the sample mean for each observation.
  2. For k = 1, 2, . . . p:
    (a) Fit all p k models that contain exactly k predictors.
    (b) Pick the best among these p k models, and call it Mk. Here best is defined as having the smallest RSS, or equivalently largest R2
19
Q

deviance

A

negative two times the maximized log-likelihood— plays the role of RSS for a broader class of models.

20
Q

Forward Stepwise Selection

A
  1. Let M0 denote the null model, which contains no
    predictors.
  2. For k = 0, . . . , p − 1:
    2.1 Consider all p − k models that augment the predictors in Mk with one additional predictor.
    2.2 Choose the best among these p − k models, and call it Mk+1. Here best is defined as having smallest RSS or highest R2
21
Q

Backward Stepwise Selection

A
  1. Let Mp denote the full model, which contains all p
    predictors.
  2. For k = p, p − 1, . . . , 1:
    2.1 Consider all k models that contain all but one of the
    predictors in Mk, for a total of k − 1 predictors.
    2.2 Choose the best among these k models, and call it Mk−1. Here best is defined as having smallest RSS or highest R2
22
Q

Estimating test error: two approaches

A

We can indirectly estimate test error by making an
adjustment to the training error to account for the bias due to overfitting.
We can directly estimate the test error, using either a
validation set approach or a cross-validation approach, as discussed in previous lectures.