294 - 333: Similarity Search in High Dimensions Flashcards
What is the Curse of Dimensionality?
It refers to the phenomenon where increasing the number of dimensions in data makes indexing and search increasingly inefficient, as distances between points tend to become indistinguishable.
What is the Jaccard Coefficient?
Jaccard Coefficient measures the similarity between two sets
A and B, calculated as J(A,B) = |A ∩ B / A ∪ B|
What is Locality Sensitive Hashing (LSH)?
LSH is an approximate hashing technique used to process similarity searches in high-dimensional spaces by grouping similar items into the same hash buckets.
In Min Hashing, the probability that the minimum hash value of two sets is the same is equivalent to which metric?
a) Euclidean Distance
b) Manhattan Distance
c) Jaccard Coefficient
d) Cosine Similarity
c) Jaccard Coefficient
How does Min Hashing approximate the Jaccard Coefficient?
By computing the smallest hash values of elements in sets and estimating similarity based on the proportion of identical hash values.
Why might Locality Sensitive Hashing (LSH) use multiple hash tables?
To increase the likelihood that similar objects will collide in at least one bucket, improving search accuracy.