Chapter 2.4: Measures of Similarity and Dissimilarity Flashcards

1
Q

similarity

A

The similarity between two objects is a numerical measure of the degree to which the two objects are alike.

Similarities are higher for pairs of objects that are more alike.

Similarities are usually non-negative and are often between 0 (no similarity) and 1 (complete similarity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

dissimilarity

A

The dissimilarity between two objects is a numerical measure of the degree to which the two objects are different.

Dissimilarities are lower for more similar pairs of objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

3 Properties of distance metrics

A
  • Positivity
  • Symmetry
  • Triangle inequality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

3 Properties of distance metrics

Positivity

A
  • d(x, y) ≥ 0 for all x and y
  • d(x, y) = 0 only if x = y
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

3 Properties of distance metrics

Symmetry

A

d(x, y) = d(y, x) for all x and y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

3 Properties of distance metrics

Triangle inequality

A

d(x, z) ≤ d(x, y) + d(y, z)

for all points x, y and z

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

2 typical properties of similarities

A

If s(x, y) is the similarity between points x and y:

  1. s(x, y) = 1 only if x = y. (0 ≤ s ≤ 1)
  2. s(x, y) = s(y, x) for all x and y (symmetry)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Similarity coefficients

A

Similarity measures between objects that contain only binary attributes are called similarity coefficients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Simple Matching Coefficient

A

A similarity coefficient defined as:

SMC = number of matching attribute values / number of attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Jaccard Coefficient

A

Used when we have a binary dataset (all attributes are either 0 or 1).

J = number of matching presences / number of attributes not involved in 00 matches

= f11 / (f11 + f10 + f01)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Cosine Similarity

A

cos(x, y) = (x’y) / (||x|| ||y||)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Extended Jaccard Coefficient

A

EJ = (xy) / (x’x + y’y - x’y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Correlation

A

corr(x, y) =
covariance(x, y)
/ [standard_dev(x) × standard_dev(y)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly