Chapter 6 - Dimensionality Reduction Flashcards
Dimensionality Reduction - idea, types
define a small set of M predictors which summarize the information in all p predictors. Principle components regression, partial least squares
Principal Components Regression - algorithm and 3 facts
1) reduce the original predictors with the first M score vectors z1, zM
2) make a linear model y = theat_i * z_i , and perform least squares regression to obtain theta_i
beta_j has the form sum[theta_m*phi_ jm]
this constraint on beta_j introduces bias but can reduce the variance
The coefficients shrink as we decrease M (due to similarities between PCR and Ridge)
Relationship between PCR and Ridge
-write out forms of linear, ridge, and PCR-
3 Simulated Examples - all predictors, 2 predictors, 5 predictors
all predictors - PCR doesn’t do so well
2 predictors - moderately ok
5 - bias increases only below p = 5, PCR and ridge do better than lasso
Partial Least Squares Regression - algorithm
in contrast with PCR, we use Y when creating Z_i
1) Z_1 = sump[phi_j1X_j] where phi_j1 is the coefficient of regressing Y onto X_j
2) X_j(2) is the residual of regressing X_j onto Z_1
3) Z_2 = sump[phi_j2X_j(2)] where phi_j1 is the coefficient of regressing Y onto X_j(2)
4) X_j(3) is the residual of regressing X_j(2) onto Z_2
….
stop at a level Z_m where m <p then choose a value of m though CV
Partial Least Squares Regression - theory (3)
1) at each step we find linear combination of predictors most correlated to response
2) after each step, we transform the predictors such that they are independent
3) compared to PCR it has less bias but more variance
- how would you do CV for this method (similar to right/wrong way to do CV)
2 Problems with High Dimension Data
p»_space; n is now very common, so we know least squares won’t work here. so we can use regularization methods.
when n = p, we can find a fit that goes through every point, methods of training error are bad, and it becomes difficult to estimate noise. measures of model fit, CIP AIC, BIC fail.
2 Takeaways from working with high dimensional data
1) adding predictors that are uncorrelated with results can hurt the performance of regression (test error)
2) when p > n there is multilinearity, many subsets will produce good results, so don’t overstate the importance of any one subset of predictors