Week 4: Dimension Reduction Flashcards

Question 1

Q

What does projection refer to in dimension reduction techniques?

Answer

A

The fact that we project high-dimensional data onto lower dimensions

Question 2

Q

What are the two main reasons for dimension reduction?

Answer

A

1) To make parameter estimation easier (parameters increase w/ dimensions), 2) to make visualization of data easier.

Question 3

Q

What is the learning task in PCA?

Answer

A

To choose how many dimensions we want to project into (D) and then pick a projection vector, w_d, for each.

Question 4

Q

How can dimension reduction prevent overfitting?

Answer

A

It reduces the number of model parameters and thus reduces the risk of overfitting.

Question 5

Q

How do we notate the new predictors (where M = new dimensions &laquo_space;p = original predictors) in dimension reduction?

Answer

A

Z1, Z2,…,ZM

Question 6

Q

How do we select M, the number of new predictors?

Answer

A

Via cross-validation.

Question 7

Q

What does scale-invariant mean?

Question 8

Q

What does it imply for the method that PCR and PLS are not scale-invariant?

Answer

A

That we have to standardize each original input xj to have mean 0 and variance 1.

Question 9

Q

When don’t we have to standardize the predictors X before creating the new, constructed predictors z?

Answer

A

If all x:s are measured in the same units.

Question 10

Q

What are the dimensions of the X matrix of the p original variables?

Answer

A

N x p (for each x_i we have p values corresponding to that x value for each parameter)

Question 11

Q

What can we decompose each x into?

Answer

A

A direction vector (e_i) and coefficients (x_1i,..x_pi).

Question 12

Q

Where will the new observed values of a 2-dim reduction (into 1-dim) when projecting onto the direction e_1?

Answer

A

On the x-axis only.

Question 13

Q

What does it mean in terms of information when projecting on e_1, and what can we do about this?

Answer

A

That we loose all information about x_i2. Instead, we can find a new direction v_1 that is neither of the two original dimensions (Y and X-axis).

Question 14

Q

How can we write the standardized inputs, Z11,…Z1N (1-dim) in terms of matrices of the original variable and new direction vector?

Answer

A

Z_1 = Xv_1

Question 15

Q

How does PLS and PCR respectively choose the direction vector v_1 (and hence Z_1)?

Answer

A

PCR: choose v_1 that minimize distance to X,

PLS: choose v_1 that explains y well.

Question 16

Q

Which constraint do we need to consider when choosing an arbitrary direction vector, v_1?

Answer

Study These Flashcards

A

v_1 ^T v_1 = 1

Question 17

Q

How can we divide the minimzation problem (and what is is?) of the PCR into two different optimization problems? Why a maximization?

Answer

Study These Flashcards

A

We can split it into first a minimization problem using the x:s, minus a maximization problem.

The mazimization part follows from the fact that we want to find v_1 such that the variance of the projections of x_i onto the direction vector is maximized - to preserve information.

Question 18

Q

What does it mean for v_1 to minimize squared loss, in terms of the new variable?

Answer

Study These Flashcards

A

For the sample variation of the new variable, Z_i1 (1-dim), to be maximized.

Question 19

Q

Explain the PLS algorithm.

Answer

Study These Flashcards

A

Question 20

Q

Out-of-sample

Answer

Study These Flashcards

A

Question 21

Q

How is PCR and ridge regression similiar?

Answer

Study These Flashcards

A

Both operate via the principal components of the input matrix.

Question 22

Q

When is a square matrix orthogonal?

Answer

Study These Flashcards

A

Sq. matrix A is orthogonal if A^T = A^(-1) or if AA^T = A^T A = I.

Question 23

Q

What is the OOS R-sq. and how can it be interpreted?

Answer

Study These Flashcards

A

Intepretead as the fraction of out-of-sample variation in output variable explained by our predicted model.

Week 4: Dimension Reduction Flashcards

(23 cards)