Clustering Flashcards

1
Q

Hard clustering

A

Every object is a member of one one cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Soft clustering

A

Objects can be members in more than one cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Hierarchical Clustering

A

Pairs of most-similar clusters are iteratively linked until all objects are in a clustering relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Non-hierachical clustering

A

Results in flat clusters of “similar” documents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Proximity Measure

A

Gives a measure of proximity between two documents. As measures are symmetric, the matrix is a triangle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Raw Overlap

A

|X intersection Y|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Dice’s coefficient

A

2|X intersection Y|/(|X| + |Y|)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Jaccard’s Coefficient

A

|X intersection Y|/|X union Y|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Overlap coefficient

A

|X intersection Y|/minimum(|X|,|Y|)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cosine overlap

A

|X intersection Y|/(sqrt(|X|) * sqrt(|Y|))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Single Link Function

A

Similarity of two most similar members.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Complete Link Function

A

Similarity of two least similar members.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Group Average Function

A

Avg. similarity of each pair of group members.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Single Link vs. Complete Link

A

O(N^2) vs. O(N^3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Centroid

A

Average vector of cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Medoid

A

Representative vector closes to centroid

17
Q

Measure of cluster quality

A

Mean square distance from each data point to its nearest centre should be minimal.

18
Q

Other Algorithms

A

Too complex and mathematical to put in flashcard - look up in notes pages 118-123.