Clustering Flashcards

1
Q

What is clustering?

A

Unsupervised learning
Dimensionality reduction
Finds hidden structure in unlabelled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why cluster?

A

Detect outliers
Simplify data
Visualise data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Goal of clusters in clustering?

A

Maximise intra-cluster similarity

Minimise inter-cluster similarity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Clustering vs Classification

A

Classification: discriminate against groups based on attributes
Clustering: determine these discriminatory attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 2 types of clustering?

A
  1. Partitional

2. Hierarchical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Partitional?

A

division of data into non overlapping clusters

K-means clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Hierarchical?

A

division of data into overlapping clusters
Dendrogram
Agglomerative - bottom up
Divisive - top down

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is K means clustering?

A

Partition clustering
Select K clusters in advance - disadvantage
Easy to implement and quick - advantage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Whats the algorithm huh???

A

For each point x:
Find nearest centroid c - euclidean distance
Assign x to c
For each cluster c:
Recalculate as average of all associated points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the convergence criteria?

A

No/min point reassignments
No/min changes in centroids
No/min change in SSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is SSE?

A

Some of square errors

Calculates the sum of squared distances between points in a cluster and the centroid of said cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Benefits of K-means?

A

Simple
Fast - O(TKN)
Always converges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Disadvantages of K-means?

A

Need to specify k in advance
Only applicable if mean is defined
(If data is categorical centroids can be represented by the mode)
Sensitive to outliers
Cannot be used for hyper ellipsoids/spheres

How well did you know this?
1
Not at all
2
3
4
5
Perfectly