Chapter 9: Cluster Analysis Flashcards

1
Q

what is cluster analysis

A

unsupervised

task of grouping a set of objects so objects in the same group are similar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

give the 3 similarity measures

A

Euclidean distance

cosine similarity

minkowski distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the key tasks of cluster analysis

A

define distance measure

identify cluster number

perform grouping

evaluate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the hyperparameters in cluster analysis

A

distance measure

cluster number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is k means clustering

A

a clustering algorithm that calculates the distances to centre points

assigns to nearest

updates centre using average of cluster points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

within each cluster, what is minimised

A

sum of squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the running time of k means clustering

A

O( T K N)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the drawbacks of k means clustering (3)

A

doesn’t cope well with noise or outliers

need to decide number of clusters

not suitable for complex patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what does the distance between clusters tell us

A

the similarity between two points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is single link measure

A

distance between clusters = minimum distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is multi link measure

A

distance between clusters = maximum distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is average link measure

A

distance between clusters = average distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

describe hierarchical clustering

A

objects grouped in a tree structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is agglomerative clustering

A

start with atomic clusters and merge until you get one big cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is divisive clustering

A

start as one big cluster and separate out to atomic clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is a dendrogram

A

plots data points and shows the distance when they were clustered together

17
Q

what is the lifetime of a cluster

A

difference between when created and when merged

18
Q

how to we get k clusters from hierarchical clustering

A

cut the tree

19
Q

what is cluster validation

A

check the clusters make logical sense

20
Q

what are the two methods of cluster validation

A

internal and external criteria

21
Q

what pattern of variation do we want

A

a good cluster should have a small in cluster and large incluster variation

22
Q

how do we calculate within cluster variance

A

sum for each point in cluster: distance(point, center)

23
Q

how do we calculate between cluster variance

A

sum for each cluster: no_points_in_cluster * d^2(cluster, data centre)

24
Q

what is external validation

A

validate against ground truth labels

25
Q

how do we evaluate against ground through labels

A

rand index

26
Q

describe rand index

A

compare cluster ID to class ID

agreement / disagreement table

rand = (a + d) / (total)