Unsupervised Learning Flashcards

Question 1

Q

What is clustering?

Answer

A

This is the task of grouping objects, objects in the same group are more similar.

Question 2

Q

What are some things to consider when clustering?

Answer

A

The distance (similarity measure) used
The Number of Clusters
The Clustering Algorithm
Evaluate the Clustering Output

Question 3

Q

What is the Single Link Measure?

Answer

A

The smallest distance between a point one cluster and a point in another cluster

Question 4

Q

What is the Complete Link Measure?

Answer

A

The maximum distance between a point in one cluster and the a point in another cluster

Question 5

Q

What is the Average Link Measure?

Answer

A

The average distance between points in two different clusters

Question 6

Q

What are some issues with K-Means Clustering?

Answer

A

Need to specify the number of clusters in advance
Different initialisation of the cluster centers give different solutions
Unable to handle noisy data and outliers
Not suitable for complex data patterns

Question 7

Q

What are the steps involved in K Means Clustering?

Answer

A

1.Compute the distance between each point and each data centre

Assign each point to the data centre that it is closest to
Move each data centre to the average of all the points that have been assigned to it
Repeat until convergence

Question 8

Q

Will K Mean Clustering always converge?

Question 9

Q

What is one advantage of hierarchical clustering?

Answer

A

We do not have to specify the number of groups in advance

Question 10

Q

What are the two types of hierarchical clustering?

Answer

A

Agglomerative
Divisive

Question 11

Q

What is Agglomerative clustering?

Answer

A

Treat each data point as a cluster

Merge the atomic clusters into larger and larger clusters

Question 12

Q

What is Divisive Clustering?

Answer

A

Treat all data points as one single cluster

Then divide this cluster into smaller and smaller clusters

Question 13

Q

What are the steps of Agglomerative Clustering?

Answer

A

Assume each point is its own cluster
Compute the distances (single-link, complete-link, average-link) between each cluster
Merge the two closest clusters
Repeat Steps 2 and 3 until we have one single cluster
Create Dendogram to show clusters at each iteration

Question 14

Q

What is the lifetime of a cluster?

Answer

A

The distance at which the cluster was created

MINUS

The distance at which it was merged

Question 15

Q

What are some drawbacks of Agglomerative Clustering?

Answer

A

Sensitive to cluster distance measure
Sensitive to noise
Less efficient

Question 16

Q

What are some different clustering methods?

Answer

Study These Flashcards

A

Centroid Based - K Means

Hierarchical - Agglomerative/ Divisive

Distribution - Assume objects from the same cluster follow the same distribution

Density Based - Define clusters as areas of higher density

Question 17

Q

How might we test clustering models?

Answer

Study These Flashcards

A

Using the within-cluster variance, and the between cluster variance

and then computing the F-Ratio

F = K *
Within_Cluster_Variance / Between_Cluster_Variance

Question 18

Q

What is External Validation used for?

Answer

Study These Flashcards

A

This is to see whether the we have actually assigned the objects to the right group

Unsupervised Learning Flashcards

(18 cards)