11 - Clustering Flashcards
1
Q
Clusting is a form of…
A
unsupervised learning
2
Q
What are the 3 types of clustering algorithms?
A
- Hierarchical
- Partitioning
- Mixture Models
3
Q
4 Steps for Hierarchical Clustering
A
1.
2.
3.
4.
4
Q
4 Steps for Hierarchical Clustering
A
1.
2.
3.
4.
5
Q
3 ways to recalculate distances … aka linkages
A
- Single Linkage
- Complete Linkage
- Average Linkage
6
Q
Cons of Hierarchical Clustering
A
- Distance Matrices must be calculated (can be time consuming for large samples)
- results are often sensitive to what distance type & what linkage method are used
7
Q
What is K-Means Clustering
A
A clustering method that requires the user to provide the number of groups they are looking for
8
Q
What are the 4 steps of K-mean Clustering
A
- Randomly select k (# of groups) points in your data. aka the centroids
- Assign all observations to their closet centroid (U now have K groups)
- Calculate the means of each group (these are the new centroids)
- Repeat 2 & 3 until nothing changes anymore
9
Q
Pros of K-mean
A
- computationally efficient
- straightforward concept
- often provides clearer groups than HC
10
Q
Cons of K-means
A
- where the algorithm (randomly) starts can affect the results
- groups will be found no matter what, even if there are no groups present in the data