PCA Final Flashcards

1
Q

PCA

A

Principal Component Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

PCA is a

A

dimensionality reduction technique

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Big idea 1: Take dataset in high dimension space

A

and transform it so it can be represented in low dimension space, with minimal or no loss of information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Big idea 2: Extract

A

latent information from the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The PCA transformation results in

A

a smaller number of principal components that maximizes the variation of the original dataset, but in low dimension space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

These principal components are

A

linear combinations of the original variables, and become the new axes of the dataset in low dimension space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

3 goals of PCA

A

Feature reduction: reduce the number of features used to represent the data
The reduced feature set should explain a large amount of information (or maximize variance)
Make visible the latent information in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

PCA creates

A

projections (principal components) in the direction that captures most of the variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sparser data has

A

greater variance (spread out)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Denser data has

A

lesser variance (clustered together)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The projections will always be

A

orthogonal to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Mathematics behind PCA

A

Eigenvalues and Eigenvectors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Mathematics equation

A

Matrix A times eigenvector X = Eigenvalue times eigenvector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Eigenvalue and Eigenvector meaning

A

An eigenvector of a matrix is a nonzero vector that, when it is multiplied by the matrix, does not change its direction. Instead, the vector is simply scaled by some factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Eigenvector are vectors that

A

remain unchanged when multiplied by A, except for a change in magnitude. Their direction remains unchanged when a linear transformation is applied to it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When we eigendecompose, when we decompose matrix, do eigendecomposition

A

if my matrix has n columns or n dimensions, i am going to have n eigenvalues and n eigenvectors

17
Q

Our matrix/dataset gets decomposed into

A

Eigenvectors
Eigenvalues

18
Q

Should we standardize for PCA?

A

Yes, always standardize

19
Q

five fields returned from prcomp(A,…)

A

sdev
rotation
center
scale
x

20
Q

sdev

A

Square root of the eigenvalues, ordered from largest eigenvalue to the smallest

21
Q

rotation

A

Matrix whose columns contain the eigenvectors (also called principal loadings)

22
Q

center

A

Mean of the columns of the matrix A

23
Q

scale

A

std dev of the columns of the matrix A

24
Q

x

A

Data from matrix A in rotated space (also called principal component scores)

25
Q

How is the data in rotated space computed

A

dot product

26
Q

Top and right axis indicate

A

tell you where these vectors are going to occur, for loading vectors

27
Q

Bottom and left axis indicate

A

The scores by which we situate the data point in their new rotated states

28
Q

How many principal components do we need?

A

As many that explain most of the variance, and adding any more to the model results in diminishing gains in variance

29
Q

Key idea: What is the proportion of variance

A

contributed by each principal component loading?

30
Q

Total Variation

A

sum of all PC

31
Q

Proportion of variance explained by ith principal component loading

A

PCi / TotalVariation

32
Q

variance is

A

squared std dev

33
Q

What do you have to do before attempting to use observations in any model?

A

Transform all of your observations (in sample, out of sample) from their natural representation to principal component scores