Quiz 4 Flashcards
1
Q
- Which of the following examples can be expressed in high-dimensional space and cast as a “finding similar items” problem?
A. Pages with similar words
B. Customers with similar purchase history
C. Images with similar features
D. All the above
A
D. All the above
2
Q
- What role does hashing play in the similarity search pipeline?
A. To efficiently compute pairwise similarities
B. To recommend a list of potential matches to a query document
C. To reduce the dimensionality of feature vectors
D. To efficiently identity/group near duplicate documents
A
D. To efficiently identity/group near duplicate documents
3
Q
- Which step in the similarity search process focuses on converting large sets to short signatures?
A. Min-Hashing
B. Shingling
C. Jaccard Similarity Calculation
D. Locality Sensitive Hashing
A
A. Min-Hashing
4
Q
- Which of the following is not a step in finding similar documents?
A. Min-Hashing
B. Shingling
C. Pairwise comparison
D. Locality Sensitive Hashing
A
C. Pairwise comparison
5
Q
- Which of the following sequences cannot be a k-shingle for the string “3162 Introduction to Data Mining” for any k?
A. 3162 Introduction
B. Introduction to Data Mining
C. Data Mining
D. 3162 Mining
A
D. 3162 Mining
6
Q
- True | False Min-Hashing produces short signatures while preserving the similarity of the original document.
A
True
7
Q
- True | False The most efficient algorithm for computing document similarity requires at least O(N2) space.
A
False
8
Q
- True | False The similarity of two signatures is the fraction of hash functions in which they agree.
A
True
9
Q
- True | False The probability that the hash values of two columns, C1 and C2, are equal under a random permutation p is equal to their Jaccard Similarity.
A
True
10
Q
- True | False Locality-sensitive hashing is primarily used to find exact matches in a large dataset.
A
False
11
Q
- True | False Documents that are potentially similar will have many shingles in common.
A
True