Week 12 Flashcards

1
Q

What is principal component analysis? What kind of data ?

A

dimension reduction – i.e., we transform the original X’s and work with M transformed variables, where M < P.

unsupervised learning method that is used to summarize a large set of correlated variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are loadings?

A

linear weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What can PCA be used for?

A

inputs for supervised learning methods or for data visualization.

extract variables corresponding to the directions along which the data vary the most, CHOOSE PHI JM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the primary purpose of PC

A

extract variables corresponding to the directions along which the data vary the most.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How many components can we decompose X into?

A

min(p,n-1) independent principal components

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the principal components obtained?

A

linear combination of All x with linear weights called loadings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The _____________ and __________ are unique up to ______

A

principal component
loadings vector
sign

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are principal components ordered on ?

A

share of the total variance of X explained. This share equals each component’s variance divided by the sum of variances of all PC’s.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If you have fewer x than observatrions, …

A

will obtain all the p components

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to tell whether 2 pca are independent?

A

PERPENDICULAR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Are principal components scale variant? Why ?

A

Scaling up one variable by a constant would blow up its variance and change it’s loading

have to standardize data unless same units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When does pca work best? What kind of data must it be used on ?

A

highly correlated data (R VALUE ABOVE 0.5)

when continuous var ( cannot have categorical variable, or else need to use CATPCA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is component score?

A

value of z variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the axes in pca biplot?

A

left and bottom: component score(zm)
top and right : loadings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is scree plot? How to find number of componennts?

A

proportion of total variance of X explained by each subsequent component

look for an “elbow” in the plot, where contribution to variance drops sharply and flattens. keep retaining components until elbow appears in the plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is it best to use pca on highly correlated variable?

A

Highly correlated variables provide redundant information on one another. Should be possible to extract a set of independent factors that explain the bulk of variation over time.

17
Q

What is the level , slope , and curvature component?

A

level: loadings are similar for all of the components

slope:loadings are similar for all of the components (downward sloping all the way past 0)

curvature: flip sign twice, u shape graph, induces movement in yield curve like butterfly

18
Q

What is the problem of constraining betas? How does pca balance this flaw?

A

Constraints on β’s may incur bias. However, dimensionality reduction has the potential to greatly reduce variance.

19
Q

What do we do once we know PCA?

A

we can run the principal components regression.

20
Q

What are the steps to running PCR?

A

Run PCA on the predictor matrix X.

Construct the sample principal components Z.

Remove all but M < P first principal components.

Regress y on z1, …, zM .

21
Q

How to choose M for unsupervised PCA and PCR?

A

unsupervised PCA, there is no clear rule: some recommend to plot proportion of variance explained on a scree plot and retain the PC’s until the plot starts to drop off (at the “elbow”).

For PCR, we can simply use K-fold cross-validation

22
Q

What is the underlying assumption in PCR?

A

directions of highest variability in X are those most associated with y

23
Q

When does the assumption fail?

A

useful signal may “hide” in the low variance component that would be discarded.

24
Q

Does PCR use variable selectrion?

A

NO

ALL ARE USED TO FORM PC

25
Q
A