w2 L2 information retrival Flashcards

Question 1

Q

what is term frequency and why is it important

Answer

A

the more frequent a key term is used in the doc the more relevant the doc is

TF(term) = 1_term_in_doc * number_of_occurences

the 1_term_in_doc value is a boolean 1 or 0 indicator function

Question 2

Q

if all of your documents contain relevant terms, how do you find the actual relevant documents

Answer

A

downweigh the too frequent terms in the colleciton and upweigh the rarer terms

if every document contains a relevent word A, the word becomes like a stop word, so the rarer relevant words need to be priorizied

Question 3

Q

what and why is inverse document frequency

Answer

A

if a word is super common we need to weigh it less and vise versa so we need the inverse frequency

IDF of a term = N/(document frequency of term)

N = total number of documents in docletion

Question 4

Q

how to calculate idf

Answer

A

IDF(term) = log(N/ (df(term)+1))

N is the total number of documents

df(term) is how often the term shows up in the dataframe + 1 for smoothing

Question 5

Q

how to calcuate tf idf

Answer

A

tf-idf = TF(term) * IDF(term)

Question 6

Q

how do we measure success of algorithm

Answer

A

if it shows the most relevent results first

precision@k

Question 7

Q

what is precision at k

Answer

A

you can order/ rank the documents by simialrity to ur querey and cut off this list at a certain point

lets call this point k

if you have access to the list of all documents relevant to the query you can measure how many relevant documents are in the top-k documents returned by the algorithm

Question 8

Q

what is mean precision @k

Answer

A

you are not interested in the results of a signle query, but all of them so you need the average P@k

sum of P@k/number of queries

Question 9

Q

what is the mean reciprocal rank

Answer

A

measures how high, on average, the algorithm place the first relevent document that it returns

how often will you be happy with the first result

Question 10

Q

what is the formula of mean recipcal rank

Answer

A

RR = 1/ rank of the first relevant document in the ranked list

MRR = sum of RR/ number of queries

w2 L2 information retrival Flashcards

(10 cards)