11 - Clustering Flashcards

1
Q

Clusting is a form of…

A

unsupervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 types of clustering algorithms?

A
  1. Hierarchical
  2. Partitioning
  3. Mixture Models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

4 Steps for Hierarchical Clustering

A

1.
2.
3.
4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

4 Steps for Hierarchical Clustering

A

1.
2.
3.
4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

3 ways to recalculate distances … aka linkages

A
  1. Single Linkage
  2. Complete Linkage
  3. Average Linkage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Cons of Hierarchical Clustering

A
  • Distance Matrices must be calculated (can be time consuming for large samples)
  • results are often sensitive to what distance type & what linkage method are used
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is K-Means Clustering

A

A clustering method that requires the user to provide the number of groups they are looking for

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the 4 steps of K-mean Clustering

A
  1. Randomly select k (# of groups) points in your data. aka the centroids
  2. Assign all observations to their closet centroid (U now have K groups)
  3. Calculate the means of each group (these are the new centroids)
  4. Repeat 2 & 3 until nothing changes anymore
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Pros of K-mean

A
  • computationally efficient
  • straightforward concept
  • often provides clearer groups than HC
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cons of K-means

A
  • where the algorithm (randomly) starts can affect the results
  • groups will be found no matter what, even if there are no groups present in the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly