Week 5 Flashcards

1
Q

Is the input to every reducer sorted?

A

yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the Jaccard coeffecient measuring

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what does a higher jaccard similarity mean

A

higher fraction of data is shared between P and Q

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is a simple deffeciency of jaccard similarity

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

in cosine similarity, what are v and w

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how do you measure the weight of a token

A

term frequency - inverse document frequency. (tf-idf)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

term freqency equation

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what 2 things can cause the tf-idf to go up for a word

A

increases in the num of occurances within a document

if the word is more rare in the collection of documents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is a safe n number for n grams in research

A

n=9

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an appliaction of near neighbor search

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is a minhash of a set

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

jaccard similarity of two sets in english

A

the size of their intersection divided by the size of their union

17
Q

how is minhash related to jaccard similarity

A

the prob(over all permutations of the rows) that h(c1) = h(c2) is the same as jaccard_sim(c1,c2)