w2 L2 information retrival Flashcards

1
Q

what is term frequency and why is it important

A

the more frequent a key term is used in the doc the more relevant the doc is

TF(term) = 1_term_in_doc * number_of_occurences

the 1_term_in_doc value is a boolean 1 or 0 indicator function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

if all of your documents contain relevant terms, how do you find the actual relevant documents

A

downweigh the too frequent terms in the colleciton and upweigh the rarer terms

if every document contains a relevent word A, the word becomes like a stop word, so the rarer relevant words need to be priorizied

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what and why is inverse document frequency

A

if a word is super common we need to weigh it less and vise versa so we need the inverse frequency

IDF of a term = N/(document frequency of term)

N = total number of documents in docletion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how to calculate idf

A

IDF(term) = log(N/ (df(term)+1))

N is the total number of documents

df(term) is how often the term shows up in the dataframe + 1 for smoothing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how to calcuate tf idf

A

tf-idf = TF(term) * IDF(term)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how do we measure success of algorithm

A

if it shows the most relevent results first

precision@k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is precision at k

A

you can order/ rank the documents by simialrity to ur querey and cut off this list at a certain point

lets call this point k

if you have access to the list of all documents relevant to the query you can measure how many relevant documents are in the top-k documents returned by the algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is mean precision @k

A

you are not interested in the results of a signle query, but all of them so you need the average P@k

sum of P@k/number of queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the mean reciprocal rank

A

measures how high, on average, the algorithm place the first relevent document that it returns

how often will you be happy with the first result

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the formula of mean recipcal rank

A

RR = 1/ rank of the first relevant document in the ranked list

MRR = sum of RR/ number of queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly