Chapter 4 Flashcards
Why is clustering regarded as unsupervised learning?(1)
There are no labelled observations which can be used to learn about the relationship between observations and group labels.
Dissimilarity measure properties. Where are they stored?(2)
Stored in distance matrix D Properties: 1. dij ≥ 0 for all (i, j); 2. dii = 0 for all i; 3. if dij = 0 then xi = xj . In addition, if xi and xj are more “similar” than xk and xl, then dij should be smaller than dkl. Many clustering algorithms also require that the dissimilarities are symmetric, i.e. dij = dji.
What is Euclidean distance? When may this be inappropriate to use?(2)
{sum from p to k=1(xik − xjk)^2}^1/2
The Euclidean distance might not be appropriate when some variables are highly correlated or vary on very different scales-standardisation of variables can again help this or malhalanobis distance can help this and high correlation.
What is Malhalanobis distance?(1)
dij =sqrt{(xi − xj)^T S^−1(xi − xj)}
S is sample variance matrix of X
When is a correlation-based distance measure appropriate? How would you calculate this?(2)
When absolute differences aren’t important but proportions/patterns-based on the correlation matrix between data vectors (note: not between variables). This matrix will have n rows and n columns and its(i, j)-th entry, denoted robsij , is the sample correlation between xi and xj. dij=1-rijobs thus small dij denotes greater similarity between data pairs
To compute this we need the correlations between observations, not variables, and so need to
transpose the data matrix
What is the Manhattan (taxi-cab) distance measure?(1)
Sum from p to k=1{|xik − xjk|}
This assigns less weight to metrics with large differences in one or more components hence useful with outliers.
What is the k-means algorithm?(3)
- Initialisation: choose K “means” m[0]1 , m[0]2 ,…,m[0]
K randomly, e.g. by randomly choosing K of the n observations, without replacement. - Iteration: repeat for j = 1, 2,… until a stopping criterion
has been satisfied:
(a) Reassignment: compute the cluster partitions C[j]1 , C[j]2 ,…,C[j] K by going through each observation xi in
turn and allocating i to C[j] k if m[j−1] k is the closest mean
in terms of Euclidean distance, i.e. if
||xi − mk^[j−1]||
Is K means guaranteed to reach a global minimum?(1)
NO! Local minimum gurantee not maximum. For this reason it is suggested that the algorithm is ran a few times from different initial means before selection of smallest SSw.
What is a way to visualise K means structuring?(1)
Plot data with first 2 PCs with points labelled according to cluster
**also do this for hierarchal and useful for determining chaining problems (one PC/group will dominate)
What are some practical implications of K-means?(2)
- choosing K may b difficult, way around this is to plot K against SSw snd try look for a kink in the data where we appear to get diminishing returns
- Not scale invariant and can impose spherical nature to the data-favouring the identification of hyperspherical clusters, even if clusters appear to have shapes which are very different from this. Can get around this through Malhalanobis transformation
What advantage does hierarchal agglomerative clustering have over K-means?(2)
- Don’t need to specify K
- More visual way with dendrogram construction
What is the hierarchal agglomerative clustering algorithm?(3)
- Initialisation: start with n clusters, C1, C2,…,Cn with Ci
containing the single observation xi. The distance matrix
for the n clusters is simply the distance matrix for the n
observations, i.e. the matrix of pairwise dissimilarities. - Iteration: for i = n, n − 1,…, 2:
(a) Fusion: find the minimum distance djk between the i
clusters. (If there is more than one minimum, pick one
uniformly at random). Fuse clusters Cj and Ck into a single cluster and remove the jth and kth rows and columns
from the distance matrix. The distance djk indicates the
height in the dendrogram at which the fusion should be
placed.
(b) Recomputation: use the chosen linkage method to calculate the distances between the new cluster and the remaining clusters. Record the i−1 distances in a new row
and column added to the distance matrix.
What are linkage methods? List the 3 most common types.(4)
-In agglomerative clustering you work backwards fusing clusters with smallest distances, linkage methods are the ways in which you determine which to fuse, three common ones include:
-single-The minimum distance between an observation in
cluster A and an observation in cluster B.
-complete-The maximum distance between an observation in cluster A and an observation in cluster B
-average-The mean distance between an observation in cluster A and an observation in cluster B.
Which of the linkage methods is not invariant under a monotone transformation?(1)
Note that single and complete linkage are invariant under any monotone (i.e. order preserving) transformation of the original distances between observations. This is not true of average linkage.
What does cutting the dendrogram do?(1)
Partitions dendogram into non-overlapping clusters.