294 - 333: Similarity Search in High Dimensions Flashcards

1
Q

What is the Curse of Dimensionality?

A

It refers to the phenomenon where increasing the number of dimensions in data makes indexing and search increasingly inefficient, as distances between points tend to become indistinguishable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the Jaccard Coefficient?

A

Jaccard Coefficient measures the similarity between two sets
A and B, calculated as J(A,B) = |A ∩ B / A ∪ B|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Locality Sensitive Hashing (LSH)?

A

LSH is an approximate hashing technique used to process similarity searches in high-dimensional spaces by grouping similar items into the same hash buckets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In Min Hashing, the probability that the minimum hash value of two sets is the same is equivalent to which metric?
a) Euclidean Distance
b) Manhattan Distance
c) Jaccard Coefficient
d) Cosine Similarity

A

c) Jaccard Coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does Min Hashing approximate the Jaccard Coefficient?

A

By computing the smallest hash values of elements in sets and estimating similarity based on the proportion of identical hash values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why might Locality Sensitive Hashing (LSH) use multiple hash tables?

A

To increase the likelihood that similar objects will collide in at least one bucket, improving search accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly