Week 4: Dimension Reduction Flashcards
What does projection refer to in dimension reduction techniques?
The fact that we project high-dimensional data onto lower dimensions
What are the two main reasons for dimension reduction?
1) To make parameter estimation easier (parameters increase w/ dimensions), 2) to make visualization of data easier.
What is the learning task in PCA?
To choose how many dimensions we want to project into (D) and then pick a projection vector, w_d, for each.
How can dimension reduction prevent overfitting?
It reduces the number of model parameters and thus reduces the risk of overfitting.
How do we notate the new predictors (where M = new dimensions «_space;p = original predictors) in dimension reduction?
Z1, Z2,…,ZM
How do we select M, the number of new predictors?
Via cross-validation.
What does scale-invariant mean?
What does it imply for the method that PCR and PLS are not scale-invariant?
That we have to standardize each original input xj to have mean 0 and variance 1.
When don’t we have to standardize the predictors X before creating the new, constructed predictors z?
If all x:s are measured in the same units.
What are the dimensions of the X matrix of the p original variables?
N x p (for each x_i we have p values corresponding to that x value for each parameter)
What can we decompose each x into?
A direction vector (e_i) and coefficients (x_1i,..x_pi).
Where will the new observed values of a 2-dim reduction (into 1-dim) when projecting onto the direction e_1?
On the x-axis only.
What does it mean in terms of information when projecting on e_1, and what can we do about this?
That we loose all information about x_i2. Instead, we can find a new direction v_1 that is neither of the two original dimensions (Y and X-axis).
How can we write the standardized inputs, Z11,…Z1N (1-dim) in terms of matrices of the original variable and new direction vector?
Z_1 = Xv_1
How does PLS and PCR respectively choose the direction vector v_1 (and hence Z_1)?
PCR: choose v_1 that minimize distance to X,
PLS: choose v_1 that explains y well.