IR – Introduction, Evaluation Flashcards

Question 1

Q

What is Information Retrieval?

Answer

A

IR is finding the most relevant n-number of documents from the collection of documents based on the given query.

Question 2

Q

What is Inverted Index?

Answer

A

It stored all statistics PER TERM:
- document frequency: how many documents contain the term
- term frequency per document: how often does the term appear per document? (docid: tf)
- document length
- average document length

Question 3

Q

How do you query the Inverted Index?

Answer

A

Get a query and split it into words
For each word, look at the inverted index for that word and calculate the score(query_term, document) and add it to the map of documentid:score
At the end, find top k entries

Question 4

Q

What is a posting list?

Answer

A

It is a structure where Inverted index is stored. For each term in the dictionary, there is a posting list of docId:num where docId is the id of a document and num is a number of times that term appears in document with document id.

Question 5

Q

Explain Bag-of-words

Answer

A

It counts the number of occurrences for each term per document

Question 6

Q

Explain the logic behind Term Frequency and formula.

Answer

A

TF t,d = how often does term t appear in document d

The more it appears in the document, the document is more relevant. Experiments show that using the logarithm is more effective than raw counting:

log(1 + TF t,d)

Question 7

Q

Explain Inverse Document Frequency

Answer

A

The main idea is that rare words are more informative than common words. If a word contains a very rare word, it is highly likely that the document is relevant. That means, we need a way to find if a word is rare or not. We do that by calculating in how many documents this word appears:

DF t = in how many documents does term t appear

IDF t = log (D / DF t)

D is the total number of documents. Logarithm is used because it shows better effectivness

Question 8

Q

What is TF-IDF?

Answer

A

Term frequency - inverse document frequency is a metric that shown how relevant a document is to some query.

TF_IDF (q, d) = sum(log(1+TF t,d) * log (D/DF t))

The relevance increases the word appears more in the document, and it increases if the word is rare (rarely appears in the corpus of documents)

Question 9

Q

What is BM25?

Answer

A

Improved version of TF-IDF

It contains two hyperparameters which have to be set by developers:
k1 - controls term frequency scaling
b - controls document length normalization (b=0 -> no length normalization, b=1 -> relative frequency, fully scale by document length)

TF_IDF increases with increasing term frequency while BM25 becomes constant much sooner (after word appears 12-15 times, the relevance shouldn’t increase)

Question 10

Q

Explain the hyperparameters in BM25 formula

Answer

A

𝑘1 controls term frequency scaling
* 𝑘1 = 0 is binary model; 𝑘1 large is raw term frequency
* 𝑏 controls document length normalization
* 𝑏 = 0 is no length normalization; 𝑏 = 1 is relative frequency (fully scale by
document length)

Common ranges: 0.5 < 𝑏 < 0.8 and 1.2 < 𝑘1 < 2

Question 11

Q

What are some possible evaluations for information retrieval?

Answer

A

Efficiency: how fast can we retrieve?
Accuracy: are the retrieved documents actually the most relevant ones?
Fairness, diversity, content quality, source credibility…

Question 12

Q

Precision vs Recall

Answer

A

Precision: How many of the selected items are relevant: TP / TP + FP

Recall: How many of the total relevant items are selected?
TP / TP + FN

Recall can be very high if we increase the number of retrieved documents (if we retrieve all possible documents, we can recall of 1 because all relevant documents will be retrieved).

In other contrast, if we retrieve 1-2 documents where we are 99.999% sure are relevany, the precision will be high but if there are 200 other relevant documents, the recall will be low.

Question 13

Q

What is F1 score?

Answer

A

In is a combination of recall and precision:

2 * (P * R) / (P + R).

It reduces the inconsistancies of precision and recall.

Question 14

Q

How would you calculate Precision, Recall and F1 score?

Question 15

Q

What is MRR?

Answer

A

Mean Reciprocal Rank is a binary evaluation metric that only looks at the first relevany document, doesn;t care about the rest of the retrieved documents.

(T F T F F) -> RR = 1
(F F T F F) -> RR = 1/3
(F T T F F) -> RR = 1/2

MRR = (1 + 1/3 + 1/2) / 3

Question 16

Q

What is MAP?

Answer

Study These Flashcards

A

Mean Average Precision is a binary evaluation metric that looks at all retrieved relevant documents. It is difficult to interpret:

Assume there are total of 2 relevant documents:
(T F T F F) -> AP = (1 + 2/3) / 2
(F F T F F) -> AP = (1/3 + 0) / 2
(F T T F F) -> AP = (1/2 + 1/3) / 2

MAP = mean of them

Question 17

Q

What is nDCG

Answer

Study These Flashcards

A

Problem with binary evaluation metrics: some documents might be more relevant that others. Labeling has to be done according to some rating: 0 = irrelevant, 3 = perfectly relevant

Assuming two relevant documents exist with relevance 3 and 1.
Ideal would be first to be retrieved 3 and then 1
Ideal DCG = 3 / log2(1 + 1) + 1 / log2(2 + 1) = 3.63

(1 0 3 0 0) -> nDCG = (1/log2(1+1) + 3/log2(3 + 1)) / idealDCG = 0.69

(0 0 1 0 0) -> nDCG = 1/log2(1 + 3) / idealDCG

(0 3 1 0 0) -> nDCG = 3/log2(1 + 2) + 1/log2(1 + 3) / idealDCG

Question 18

Q

Calculate nDCG

Answer

Study These Flashcards

A

IR – Introduction, Evaluation Flashcards

Lection 3 (18 cards)