SRM Chapter 6 - Unsupervised Learning Flashcards
6.1 Principal Components Analysis 6.2 Cluster Analysis
PCA (Principal Components Analysis)
- Reduces complexity by transforming variables into a smaller number of principal components that highlights most important features of the data (explain a sufficient amount of variability).
- Often applied before supervised models
k-means Clustering
- Divides data into predetermined number of clusters
- Such that variance within each cluster is minimized
Hierarchal Clustering
- Don’t have to specify number of clusters upfront
- Dendrogram -> tree that allows for flexible cluster analysis
What is a principal component? Its features?
- Each principal component is a linear combination of ALL features in the dataset
- Features in the dataset are assumed to have a mean of 0 (centered)
Loadings
Like multipliers for each predictor (?)
First principal component
- Explains the largest portion of variance in a dataset
- i.e. PCA goes in order adding on components until a sufficient amount of variability is explained (goal is to have the lowest number of components for this to be true)
How are values for the first principal component loadings determined?
By maximizing the sample variance of the first principal component
- Note: a NORMALIZED linear combination of the features is used to circumvent the variance being inflated (which happens if we set the loadings to be as large as possible)
Second principal component
Linear combination of features that maximizes the remaining variability in the dataset (not captured by the 1st principal component)
What is the dot product of the loading vectors for PC1 and PC2? Why?
0, because they are orthogonal
How can we solve for loading vectors?
Eigen decomposition (not tested) of the covariance matrix
- This produces eigenvalues (variances of each PC)
and eigenvectors (loading vectors).
What is the max number of distinct principal components that can be created?
For a dataset with n observations and p features, the max number of PCs is
min(n-1,p)
Distinct principal component
Distinct if the variance is non-zero (means adding this new component still helps to capture some of the variance in the dataset).
Biplot
- Plots PC1 and PC2 against each other (bottom x and y axes respectively)
- Can only visualize two PCs at a time
- Look at the x and y components of each predictor vector: if the x component is bigger then PC1 places more weight on this predictor; if the y component is bigger then
PC2 places more weight on this predictor. - Weight is decided by the magnitude of each vector