Week 4: Dimension Reduction Flashcards

1
Q

What does projection refer to in dimension reduction techniques?

A

The fact that we project high-dimensional data onto lower dimensions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two main reasons for dimension reduction?

A

1) To make parameter estimation easier (parameters increase w/ dimensions), 2) to make visualization of data easier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the learning task in PCA?

A

To choose how many dimensions we want to project into (D) and then pick a projection vector, w_d, for each.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can dimension reduction prevent overfitting?

A

It reduces the number of model parameters and thus reduces the risk of overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we notate the new predictors (where M = new dimensions &laquo_space;p = original predictors) in dimension reduction?

A

Z1, Z2,…,ZM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we select M, the number of new predictors?

A

Via cross-validation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does scale-invariant mean?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does it imply for the method that PCR and PLS are not scale-invariant?

A

That we have to standardize each original input xj to have mean 0 and variance 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When don’t we have to standardize the predictors X before creating the new, constructed predictors z?

A

If all x:s are measured in the same units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the dimensions of the X matrix of the p original variables?

A

N x p (for each x_i we have p values corresponding to that x value for each parameter)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What can we decompose each x into?

A

A direction vector (e_i) and coefficients (x_1i,..x_pi).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Where will the new observed values of a 2-dim reduction (into 1-dim) when projecting onto the direction e_1?

A

On the x-axis only.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does it mean in terms of information when projecting on e_1, and what can we do about this?

A

That we loose all information about x_i2. Instead, we can find a new direction v_1 that is neither of the two original dimensions (Y and X-axis).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can we write the standardized inputs, Z11,…Z1N (1-dim) in terms of matrices of the original variable and new direction vector?

A

Z_1 = Xv_1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does PLS and PCR respectively choose the direction vector v_1 (and hence Z_1)?

A

PCR: choose v_1 that minimize distance to X,

PLS: choose v_1 that explains y well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which constraint do we need to consider when choosing an arbitrary direction vector, v_1?

A

v_1 ^T v_1 = 1

17
Q

How can we divide the minimzation problem (and what is is?) of the PCR into two different optimization problems? Why a maximization?

A

We can split it into first a minimization problem using the x:s, minus a maximization problem.

The mazimization part follows from the fact that we want to find v_1 such that the variance of the projections of x_i onto the direction vector is maximized - to preserve information.

18
Q

What does it mean for v_1 to minimize squared loss, in terms of the new variable?

A

For the sample variation of the new variable, Z_i1 (1-dim), to be maximized.

19
Q

Explain the PLS algorithm.

A
20
Q

Out-of-sample

A
21
Q

How is PCR and ridge regression similiar?

A

Both operate via the principal components of the input matrix.

22
Q

When is a square matrix orthogonal?

A

Sq. matrix A is orthogonal if A^T = A^(-1) or if AA^T = A^T A = I.

23
Q

What is the OOS R-sq. and how can it be interpreted?

A

Intepretead as the fraction of out-of-sample variation in output variable explained by our predicted model.