SRM Chapter 6 - Unsupervised Learning Flashcards

Question 1

Q

PCA (Principal Components Analysis)

Answer

A

Reduces complexity by transforming variables into a smaller number of principal components that highlights most important features of the data (explain a sufficient amount of variability).
Often applied before supervised models

Question 2

Q

k-means Clustering

Answer

A

Divides data into predetermined number of clusters
Such that variance within each cluster is minimized

Question 3

Q

Hierarchal Clustering

Answer

A

Don’t have to specify number of clusters upfront
Dendrogram -> tree that allows for flexible cluster analysis

Question 4

Q

What is a principal component? Its features?

Answer

A

Each principal component is a linear combination of ALL features in the dataset
Features in the dataset are assumed to have a mean of 0 (centered)

Question 5

Q

Loadings

Answer

A

Like multipliers for each predictor (?)

Question 6

Q

First principal component

Answer

A

Explains the largest portion of variance in a dataset
i.e. PCA goes in order adding on components until a sufficient amount of variability is explained (goal is to have the lowest number of components for this to be true)

Question 7

Q

How are values for the first principal component loadings determined?

Answer

A

By maximizing the sample variance of the first principal component

Note: a NORMALIZED linear combination of the features is used to circumvent the variance being inflated (which happens if we set the loadings to be as large as possible)

Question 8

Q

Second principal component

Answer

A

Linear combination of features that maximizes the remaining variability in the dataset (not captured by the 1st principal component)

Question 9

Q

What is the dot product of the loading vectors for PC1 and PC2? Why?

Answer

A

0, because they are orthogonal

Question 10

Q

How can we solve for loading vectors?

Answer

A

Eigen decomposition (not tested) of the covariance matrix
- This produces eigenvalues (variances of each PC)
and eigenvectors (loading vectors).

Question 11

Q

What is the max number of distinct principal components that can be created?

Answer

A

For a dataset with n observations and p features, the max number of PCs is

min(n-1,p)

Question 12

Q

Distinct principal component

Answer

A

Distinct if the variance is non-zero (means adding this new component still helps to capture some of the variance in the dataset).

Question 13

Q

Biplot

Answer

A

Plots PC1 and PC2 against each other (bottom x and y axes respectively)
Can only visualize two PCs at a time
Look at the x and y components of each predictor vector: if the x component is bigger then PC1 places more weight on this predictor; if the y component is bigger then
PC2 places more weight on this predictor.
Weight is decided by the magnitude of each vector

Question 14

Q

Why is scaling necessary in PCA?

Answer

A

Because if predictors are scaled differently, the PCs will place more weight on some more than others (you can see this in the biplot of all the predictors)

Question 15

Q

Does PCR perform feature selection?

Answer

A

NO. PCR does not perform feature selection. All variables are used in producing the PCs

Question 16

Q

The first principal component: (2)

Answer

Study These Flashcards

A

Is the line in p-dimensional space that is closest to the observations
Is the direction that explains the most variance

Question 17

Q

How does PCA reduce dimensionality?

Answer

Study These Flashcards

A

By involving linear transformations

Question 18

Q

What happens if the # of PCs = the # of original variables?

Answer

Study These Flashcards

A

Data approximation is exact (think all variables being used in some way, 100% of the variability is explained)

Question 19

Q

T/F PCA most useful for data with strong NON-linear relationships

Answer

Study These Flashcards

A

FALSE, most suitable for linear, it’s a linear technique

Question 20

Q

The sum of the scores of each PC must be:

Answer

Study These Flashcards

A

0 (NOT 1)

Because data centred (mean is) around 0, so +/- deviations cancel out

Question 21

Q

PC loading vectors
PC scores

Answer

Study These Flashcards

A

Loading vectors: DIRECTIONS in space along which the data vary the most

Scores: PROJECTIONS along the directions

Question 22

Q

K-means clustering: at each iteration how does the number of clusters change?

Answer

Study These Flashcards

A

Either same or less clusters

SRM Chapter 6 - Unsupervised Learning Flashcards

6.1 Principal Components Analysis 6.2 Cluster Analysis (22 cards)