Session 6.2 Flashcards

1
Q

Clustering - what is it?

A

• Finding groups in data.
• Organising data into groups such that there is:
(1) high similarity within each group,
(2) low similarity across the groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Is clustering the same as classification?

A

No

• Class labels can be found directly in the data. E.g., blood type.

• Different goals: to “understand” the data better (explore), to organise the
information we have.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Distance measures

A
  1. Euclidean distance
  2. Manhattan distance
  3. Jaccard distance
  4. Cosine distance
  5. Edit distance (Levenshtein metric)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Jaccard distance used when

A

The possession of a common characteristic between two items is important, but the common absence of a characteristic is not.

• Especially useful when dealing with problems that involve (large) sets of
characteristics that may not be ‘symmetrically’ important.

• Text mining: compare whether two documents contain the same word.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cosine distance often encountered in

A

text mining or recommendation engines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Edit distance (Levenshtein metric)

A
  • Text mining applications.

* Applications: Autocorrect (spelling mistakes).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Euclidean distance

A
  • The most common geometric distance measure.
  • A numeric dataset with attributes similar in terms of measurement type (similar scale) and units.
  • Can be understood as physical distance between two data points.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Manhattan distance

A
  • The sum of the absolute differences between pairwise attributes.
  • The “taxicab” distance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Cosine distance

A

The term relates to the method of measurement - the cosine of the angle between two vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly