Week 5 Flashcards
1
Q
Is the input to every reducer sorted?
A
yes
2
Q
What is the Jaccard coeffecient measuring
A
3
Q
what does a higher jaccard similarity mean
A
higher fraction of data is shared between P and Q
4
Q
A
5
Q
what is a simple deffeciency of jaccard similarity
A
6
Q
in cosine similarity, what are v and w
A
7
Q
how do you measure the weight of a token
A
term frequency - inverse document frequency. (tf-idf)
8
Q
term freqency equation
A
9
Q
A
1
10
Q
A
11
Q
what 2 things can cause the tf-idf to go up for a word
A
increases in the num of occurances within a document
if the word is more rare in the collection of documents
12
Q
A
13
Q
what is a safe n number for n grams in research
A
n=9
14
Q
What is an appliaction of near neighbor search
A
15
Q
what is a minhash of a set
A