Chapter 6 - Dimensionality Reduction Flashcards

Question 1

Q

Dimensionality Reduction - idea, types

Answer

A

define a small set of M predictors which summarize the information in all p predictors. Principle components regression, partial least squares

Question 2

Q

Principal Components Regression - algorithm and 3 facts

Answer

A

1) reduce the original predictors with the first M score vectors z1, zM
2) make a linear model y = theat_i * z_i , and perform least squares regression to obtain theta_i

beta_j has the form sum[theta_m*phi_ jm]

this constraint on beta_j introduces bias but can reduce the variance

The coefficients shrink as we decrease M (due to similarities between PCR and Ridge)

Question 3

Q

Relationship between PCR and Ridge

Answer

A

-write out forms of linear, ridge, and PCR-

Question 4

Q

3 Simulated Examples - all predictors, 2 predictors, 5 predictors

Answer

A

all predictors - PCR doesn’t do so well
2 predictors - moderately ok
5 - bias increases only below p = 5, PCR and ridge do better than lasso

Question 5

Q

Partial Least Squares Regression - algorithm

Answer

A

in contrast with PCR, we use Y when creating Z_i

1) Z_1 = sump[phi_j1X_j] where phi_j1 is the coefficient of regressing Y onto X_j
2) X_j(2) is the residual of regressing X_j onto Z_1
3) Z_2 = sump[phi_j2X_j(2)] where phi_j1 is the coefficient of regressing Y onto X_j(2)
4) X_j(3) is the residual of regressing X_j(2) onto Z_2
….

stop at a level Z_m where m <p then choose a value of m though CV

Question 6

Q

Partial Least Squares Regression - theory (3)

Answer

A

1) at each step we find linear combination of predictors most correlated to response
2) after each step, we transform the predictors such that they are independent
3) compared to PCR it has less bias but more variance

how would you do CV for this method (similar to right/wrong way to do CV)

Question 7

Q

2 Problems with High Dimension Data

Answer

A

p&raquo_space; n is now very common, so we know least squares won’t work here. so we can use regularization methods.

when n = p, we can find a fit that goes through every point, methods of training error are bad, and it becomes difficult to estimate noise. measures of model fit, CIP AIC, BIC fail.

Question 8

Q

2 Takeaways from working with high dimensional data

Answer

A

1) adding predictors that are uncorrelated with results can hurt the performance of regression (test error)
2) when p > n there is multilinearity, many subsets will produce good results, so don’t overstate the importance of any one subset of predictors

Chapter 6 - Dimensionality Reduction Flashcards

(8 cards)