SRM Chapter 6 - Unsupervised Learning Flashcards

6.1 Principal Components Analysis 6.2 Cluster Analysis

1
Q

PCA (Principal Components Analysis)

A
  • Reduces complexity by transforming variables into a smaller number of principal components that highlights most important features of the data (explain a sufficient amount of variability).
  • Often applied before supervised models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

k-means Clustering

A
  • Divides data into predetermined number of clusters
  • Such that variance within each cluster is minimized
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Hierarchal Clustering

A
  • Don’t have to specify number of clusters upfront
  • Dendrogram -> tree that allows for flexible cluster analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a principal component? Its features?

A
  • Each principal component is a linear combination of ALL features in the dataset
  • Features in the dataset are assumed to have a mean of 0 (centered)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Loadings

A

Like multipliers for each predictor (?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

First principal component

A
  • Explains the largest portion of variance in a dataset
  • i.e. PCA goes in order adding on components until a sufficient amount of variability is explained (goal is to have the lowest number of components for this to be true)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How are values for the first principal component loadings determined?

A

By maximizing the sample variance of the first principal component

  • Note: a NORMALIZED linear combination of the features is used to circumvent the variance being inflated (which happens if we set the loadings to be as large as possible)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Second principal component

A

Linear combination of features that maximizes the remaining variability in the dataset (not captured by the 1st principal component)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the dot product of the loading vectors for PC1 and PC2? Why?

A

0, because they are orthogonal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we solve for loading vectors?

A

Eigen decomposition (not tested) of the covariance matrix
- This produces eigenvalues (variances of each PC)
and eigenvectors (loading vectors).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the max number of distinct principal components that can be created?

A

For a dataset with n observations and p features, the max number of PCs is

min(n-1,p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Distinct principal component

A

Distinct if the variance is non-zero (means adding this new component still helps to capture some of the variance in the dataset).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Biplot

A
  • Plots PC1 and PC2 against each other (bottom x and y axes respectively)
  • Can only visualize two PCs at a time
  • Look at the x and y components of each predictor vector: if the x component is bigger then PC1 places more weight on this predictor; if the y component is bigger then
    PC2 places more weight on this predictor.
  • Weight is decided by the magnitude of each vector
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly