Chapter 4 Flashcards

Question 1

Q

Why is clustering regarded as unsupervised learning?(1)

Answer

A

There are no labelled observations which can be used to learn about the relationship between observations and group labels.

Question 2

Q

Dissimilarity measure properties. Where are they stored?(2)

Answer

A

Stored in distance matrix D
Properties:
1. dij ≥ 0 for all (i, j);
2. dii = 0 for all i;
3. if dij = 0 then xi = xj .
In addition, if xi and xj are more “similar” than xk and xl, then dij should be smaller than dkl. Many clustering algorithms also require that the dissimilarities are symmetric, i.e. dij = dji.

Question 3

Q

What is Euclidean distance? When may this be inappropriate to use?(2)

Answer

A

{sum from p to k=1(xik − xjk)^2}^1/2

The Euclidean distance might not be appropriate when some variables are highly correlated or vary on very different scales-standardisation of variables can again help this or malhalanobis distance can help this and high correlation.

Question 4

Q

What is Malhalanobis distance?(1)

Answer

A

dij =sqrt{(xi − xj)^T S^−1(xi − xj)}

S is sample variance matrix of X

Question 5

Q

When is a correlation-based distance measure appropriate? How would you calculate this?(2)

Answer

A

When absolute differences aren’t important but proportions/patterns-based on the correlation matrix between data vectors (note: not between variables). This matrix will have n rows and n columns and its(i, j)-th entry, denoted robsij , is the sample correlation between xi and xj. dij=1-rijobs thus small dij denotes greater similarity between data pairs
To compute this we need the correlations between observations, not variables, and so need to
transpose the data matrix

Question 6

Q

What is the Manhattan (taxi-cab) distance measure?(1)

Answer

A

Sum from p to k=1{|xik − xjk|}

This assigns less weight to metrics with large differences in one or more components hence useful with outliers.

Question 7

Q

What is the k-means algorithm?(3)

Answer

A

Initialisation: choose K “means” m[0]1 , m[0]2 ,…,m[0]
K randomly, e.g. by randomly choosing K of the n observations, without replacement.
Iteration: repeat for j = 1, 2,… until a stopping criterion
has been satisfied:
(a) Reassignment: compute the cluster partitions C[j]1 , C[j]2 ,…,C[j] K by going through each observation xi in
turn and allocating i to C[j] k if m[j−1] k is the closest mean
in terms of Euclidean distance, i.e. if
||xi − mk^[j−1]||

Question 8

Q

Is K means guaranteed to reach a global minimum?(1)

Answer

A

NO! Local minimum gurantee not maximum. For this reason it is suggested that the algorithm is ran a few times from different initial means before selection of smallest SSw.

Question 9

Q

What is a way to visualise K means structuring?(1)

Answer

A

Plot data with first 2 PCs with points labelled according to cluster
**also do this for hierarchal and useful for determining chaining problems (one PC/group will dominate)

Question 10

Q

What are some practical implications of K-means?(2)

Answer

A

choosing K may b difficult, way around this is to plot K against SSw snd try look for a kink in the data where we appear to get diminishing returns
Not scale invariant and can impose spherical nature to the data-favouring the identification of hyperspherical clusters, even if clusters appear to have shapes which are very different from this. Can get around this through Malhalanobis transformation

Question 11

Q

What advantage does hierarchal agglomerative clustering have over K-means?(2)

Answer

A

Don’t need to specify K

- More visual way with dendrogram construction

Question 12

Q

What is the hierarchal agglomerative clustering algorithm?(3)

Answer

A

Initialisation: start with n clusters, C1, C2,…,Cn with Ci
containing the single observation xi. The distance matrix
for the n clusters is simply the distance matrix for the n
observations, i.e. the matrix of pairwise dissimilarities.
Iteration: for i = n, n − 1,…, 2:
(a) Fusion: find the minimum distance djk between the i
clusters. (If there is more than one minimum, pick one
uniformly at random). Fuse clusters Cj and Ck into a single cluster and remove the jth and kth rows and columns
from the distance matrix. The distance djk indicates the
height in the dendrogram at which the fusion should be
placed.
(b) Recomputation: use the chosen linkage method to calculate the distances between the new cluster and the remaining clusters. Record the i−1 distances in a new row
and column added to the distance matrix.

Question 13

Q

What are linkage methods? List the 3 most common types.(4)

Answer

A

-In agglomerative clustering you work backwards fusing clusters with smallest distances, linkage methods are the ways in which you determine which to fuse, three common ones include:
-single-The minimum distance between an observation in
cluster A and an observation in cluster B.
-complete-The maximum distance between an observation in cluster A and an observation in cluster B
-average-The mean distance between an observation in cluster A and an observation in cluster B.

Question 14

Q

Which of the linkage methods is not invariant under a monotone transformation?(1)

Answer

A

Note that single and complete linkage are invariant under any monotone (i.e. order preserving) transformation of the original distances between observations. This is not true of average linkage.

Question 15

Q

What does cutting the dendrogram do?(1)

Answer

A

Partitions dendogram into non-overlapping clusters.

Question 16

Q

What are some practical implications of hierarchal clustering?(2)

Answer

Study These Flashcards

A

Choice of cut-heuristic way is to look for large chsnge in distance for fusion
Choice of linkage method (due to sensitivity of dissimilarity and linkage method chosen)-e.g. chaining problems

Question 17

Q

What is chaining?(1)

Answer

Study These Flashcards

A

Common in single linkage methods whereby nearby observations are fused one by one leading to long thin clusters

Question 18

Q

Which linkage method is the best?(1)

Answer

Study These Flashcards

A

Complete tend to be the best and tend to provide the most balanced dendograms so are generally preferred, average interpolate between these and single and single can have chaining. However, always useful to run through diff linkage methods (and dissimilarity measures) to decide which as hierarchal structure is an artefact of the clustering method.

Chapter 4 Flashcards

(18 cards)