Unsupervised Learning Flashcards

Question 1

Q

PCA vs Princpal Components Regresssion

Answer

A

PCR just refers to performing a regression on the dimensions obtained from transforming feature set into Principal components. PCA is the UNSUPERVISED analysis of a set of data and using that to understand more about a set of data.

Question 2

Q

Pre processing steps to PCA

Answer

A

you have to center and scale your variables prior to PCA.

Question 3

Q

Things you can do with PCA

Answer

A

you can look at the loading vector output from PCA, and can understand what underlying variables are most important for each principal component. If multiple variables are equally important inside each prinicpal component, they are highly correlated with eachother. Also, variables that are part of different principal components are not correlated with the rest generally. Kind of like clustering your variables. see pg. 377 Intro to Stat Learning.

Question 4

Q

How to know how many principal components you need in PCA?

Answer

A

Use Proportion of Variance Explained PVE by each principal component. Can also compute for M principal components. You can plot the cumalitive PVE against the # of principal components, to see if there is an “elbow”

Question 5

Q

How to interpret hierarchical clustering

Answer

A

You are presented with a diagram called a dendogram. The height of where branches fuse together are important. The higher up on the vertical axes, the less similar they are even though they fuse together. Things that fuse together towards the bottom are more similar to eachother. Position on the vertical axes is of the utmost importance. Things that fuse together high on the vertical axes may not be similar to anything at all!

Question 6

Q

How to interpret hierarchical clustering

Answer

A

You are presented with a diagram called a dendogram. The height of where branches fuse together are important. The higher up on the vertical axes, the less similar they are even though they fuse together. Things that fuse together towards the bottom are more similar to eachother. Position on the vertical axes is of the utmost importance. Things that fuse together high on the vertical axes may not be similar to anything at all!

Question 7

Q

Parameters of Hierachal Clustering

Answer

A

Distance Measure (Euclidean, Correlation Based, Manhattan)
Linkage - Use “Complete Linkage” , or second best “Average Linkage”
What height to “cut” the dendogram, analogous to choosing # of clusters

Question 8

Q

Parameters of Hierachal Clustering

Answer

A

Distance Measure (Euclidean, Correlation Based, Manhattan)
Linkage - Use “Complete Linkage” , or second best “Average Linkage”
What height to “cut” the dendogram, analogous to choosing # of clusters

Question 9

Q

Things you always want to do in clustering

Answer

A

Center and Scale all vars

Question 10

Q

When would you want to use correlation based distance

Answer

A

when the features are highly correlated, which will give you a different result then just eucledian distance because the observed values may be far apart. ## WARNING: must have at least 3 features for this, if you only have 2 features this wont’ work.

Question 11

Q

What to do in kmeans or hierachal clustering when there are lots of dimensions

Answer

A

you really have to perform PCA, because you won’t get good results otherwise.

Unsupervised Learning Flashcards

(11 cards)