PCA Final Flashcards

Question 1

Q

PCA

Answer

A

Principal Component Analysis

Question 2

Q

PCA is a

Answer

A

dimensionality reduction technique

Question 3

Q

Big idea 1: Take dataset in high dimension space

Answer

A

and transform it so it can be represented in low dimension space, with minimal or no loss of information

Question 4

Q

Big idea 2: Extract

Answer

A

latent information from the data

Question 5

Q

The PCA transformation results in

Answer

A

a smaller number of principal components that maximizes the variation of the original dataset, but in low dimension space

Question 6

Q

These principal components are

Answer

A

linear combinations of the original variables, and become the new axes of the dataset in low dimension space

Question 7

Q

3 goals of PCA

Answer

A

Feature reduction: reduce the number of features used to represent the data
The reduced feature set should explain a large amount of information (or maximize variance)
Make visible the latent information in the data

Question 8

Q

PCA creates

Answer

A

projections (principal components) in the direction that captures most of the variance

Question 9

Q

Sparser data has

Answer

A

greater variance (spread out)

Question 10

Q

Denser data has

Answer

A

lesser variance (clustered together)

Question 11

Q

The projections will always be

Answer

A

orthogonal to each other

Question 12

Q

Mathematics behind PCA

Answer

A

Eigenvalues and Eigenvectors

Question 13

Q

Mathematics equation

Answer

A

Matrix A times eigenvector X = Eigenvalue times eigenvector

Question 14

Q

Eigenvalue and Eigenvector meaning

Answer

A

An eigenvector of a matrix is a nonzero vector that, when it is multiplied by the matrix, does not change its direction. Instead, the vector is simply scaled by some factor

Question 15

Q

Eigenvector are vectors that

Answer

A

remain unchanged when multiplied by A, except for a change in magnitude. Their direction remains unchanged when a linear transformation is applied to it

Question 16

Q

When we eigendecompose, when we decompose matrix, do eigendecomposition

Answer

A

if my matrix has n columns or n dimensions, i am going to have n eigenvalues and n eigenvectors

Question 17

Q

Our matrix/dataset gets decomposed into

Answer

A

Eigenvectors
Eigenvalues

Question 18

Q

Should we standardize for PCA?

Answer

A

Yes, always standardize

Question 19

Q

five fields returned from prcomp(A,…)

Answer

A

sdev
rotation
center
scale
x

Question 20

Q

sdev

Answer

A

Square root of the eigenvalues, ordered from largest eigenvalue to the smallest

Question 21

Q

rotation

Answer

A

Matrix whose columns contain the eigenvectors (also called principal loadings)

Question 22

Q

center

Answer

A

Mean of the columns of the matrix A

Question 23

Q

scale

Answer

A

std dev of the columns of the matrix A

Question 24

Q

x

Answer

A

Data from matrix A in rotated space (also called principal component scores)

Question 25

Q

How is the data in rotated space computed

Answer

A

dot product

Question 26

Q

Top and right axis indicate

Answer

A

tell you where these vectors are going to occur, for loading vectors

Question 27

Q

Bottom and left axis indicate

Answer

A

The scores by which we situate the data point in their new rotated states

Question 28

Q

How many principal components do we need?

Answer

A

As many that explain most of the variance, and adding any more to the model results in diminishing gains in variance

Question 29

Q

Key idea: What is the proportion of variance

Answer

A

contributed by each principal component loading?

Question 30

Q

Total Variation

Answer

A

sum of all PC

Question 31

Q

Proportion of variance explained by ith principal component loading

Answer

A

PCi / TotalVariation

Question 32

Q

variance is

Answer

A

squared std dev

Question 33

Q

What do you have to do before attempting to use observations in any model?

Answer

A

Transform all of your observations (in sample, out of sample) from their natural representation to principal component scores