Unsupervised Learning Flashcards

1
Q

PCA vs Princpal Components Regresssion

A

PCR just refers to performing a regression on the dimensions obtained from transforming feature set into Principal components. PCA is the UNSUPERVISED analysis of a set of data and using that to understand more about a set of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Pre processing steps to PCA

A

you have to center and scale your variables prior to PCA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Things you can do with PCA

A

you can look at the loading vector output from PCA, and can understand what underlying variables are most important for each principal component. If multiple variables are equally important inside each prinicpal component, they are highly correlated with eachother. Also, variables that are part of different principal components are not correlated with the rest generally. Kind of like clustering your variables. see pg. 377 Intro to Stat Learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to know how many principal components you need in PCA?

A

Use Proportion of Variance Explained PVE by each principal component. Can also compute for M principal components. You can plot the cumalitive PVE against the # of principal components, to see if there is an “elbow”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to interpret hierarchical clustering

A

You are presented with a diagram called a dendogram. The height of where branches fuse together are important. The higher up on the vertical axes, the less similar they are even though they fuse together. Things that fuse together towards the bottom are more similar to eachother. Position on the vertical axes is of the utmost importance. Things that fuse together high on the vertical axes may not be similar to anything at all!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to interpret hierarchical clustering

A

You are presented with a diagram called a dendogram. The height of where branches fuse together are important. The higher up on the vertical axes, the less similar they are even though they fuse together. Things that fuse together towards the bottom are more similar to eachother. Position on the vertical axes is of the utmost importance. Things that fuse together high on the vertical axes may not be similar to anything at all!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Parameters of Hierachal Clustering

A
  1. Distance Measure (Euclidean, Correlation Based, Manhattan)
  2. Linkage - Use “Complete Linkage” , or second best “Average Linkage”
  3. What height to “cut” the dendogram, analogous to choosing # of clusters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Parameters of Hierachal Clustering

A
  1. Distance Measure (Euclidean, Correlation Based, Manhattan)
  2. Linkage - Use “Complete Linkage” , or second best “Average Linkage”
  3. What height to “cut” the dendogram, analogous to choosing # of clusters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Things you always want to do in clustering

A
  1. Center and Scale all vars
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When would you want to use correlation based distance

A

when the features are highly correlated, which will give you a different result then just eucledian distance because the observed values may be far apart. ## WARNING: must have at least 3 features for this, if you only have 2 features this wont’ work.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What to do in kmeans or hierachal clustering when there are lots of dimensions

A

you really have to perform PCA, because you won’t get good results otherwise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly