Unsupervised Learning Flashcards

1
Q

What is clustering?

A

This is the task of grouping objects, objects in the same group are more similar.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some things to consider when clustering?

A
  1. The distance (similarity measure) used
  2. The Number of Clusters
  3. The Clustering Algorithm
  4. Evaluate the Clustering Output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Single Link Measure?

A

The smallest distance between a point one cluster and a point in another cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the Complete Link Measure?

A

The maximum distance between a point in one cluster and the a point in another cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Average Link Measure?

A

The average distance between points in two different clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some issues with K-Means Clustering?

A
  1. Need to specify the number of clusters in advance
  2. Different initialisation of the cluster centers give different solutions
  3. Unable to handle noisy data and outliers
  4. Not suitable for complex data patterns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the steps involved in K Means Clustering?

A

1.Compute the distance between each point and each data centre

  1. Assign each point to the data centre that it is closest to
  2. Move each data centre to the average of all the points that have been assigned to it
  3. Repeat until convergence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Will K Mean Clustering always converge?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is one advantage of hierarchical clustering?

A

We do not have to specify the number of groups in advance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two types of hierarchical clustering?

A
  1. Agglomerative
  2. Divisive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Agglomerative clustering?

A

Treat each data point as a cluster

Merge the atomic clusters into larger and larger clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Divisive Clustering?

A

Treat all data points as one single cluster

Then divide this cluster into smaller and smaller clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the steps of Agglomerative Clustering?

A
  1. Assume each point is its own cluster
  2. Compute the distances (single-link, complete-link, average-link) between each cluster
  3. Merge the two closest clusters
  4. Repeat Steps 2 and 3 until we have one single cluster
  5. Create Dendogram to show clusters at each iteration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the lifetime of a cluster?

A

The distance at which the cluster was created

MINUS

The distance at which it was merged

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some drawbacks of Agglomerative Clustering?

A
  1. Sensitive to cluster distance measure
  2. Sensitive to noise
  3. Less efficient
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some different clustering methods?

A

Centroid Based - K Means

Hierarchical - Agglomerative/ Divisive

Distribution - Assume objects from the same cluster follow the same distribution

Density Based - Define clusters as areas of higher density

17
Q

How might we test clustering models?

A

Using the within-cluster variance, and the between cluster variance

and then computing the F-Ratio

F = K *
Within_Cluster_Variance / Between_Cluster_Variance

18
Q

What is External Validation used for?

A

This is to see whether the we have actually assigned the objects to the right group