IR – Introduction, Evaluation Flashcards

Lection 3

1
Q

What is Information Retrieval?

A

IR is finding the most relevant n-number of documents from the collection of documents based on the given query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Inverted Index?

A

It stored all statistics PER TERM:
- document frequency: how many documents contain the term
- term frequency per document: how often does the term appear per document?
- document length
- average document length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a posting list?

A

It is a structure where Inverted index is stored. For each term in the dictionary, there is a posting list of docId:num where docId is the id of a document and num is a number of times that term appears in document with document id.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain Bag-of-words

A

It counts the number of occurrences for each term per document

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain the logic behind Term Frequency and formula.

A

TF t,d = how often does term t appear in document d

The more it appears in the document, the document is more relevant. Experiments show that using the logarithm is more effective than raw counting:

log(1 + TF t,d)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain Inverse Document Frequency

A

The main idea is that rare words are more informative than common words. If a word contains a very rare word, it is highly likely that the document is relevant. That means, we need a way to find if a word is rare or not. We do that by calculating in how many documents this word appears:

DF t = in how many documents does term t appear

IDF t = log (D / DF t)

D is the total number of documents. Logarithm is used because it shows better effectivness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is TF-IDF?

A

Term frequency - inverse document frequency is a metric that shown how relevant a document is to some query.

TF_IDF (q, d) = sum(log(1+TF t,d) * log (D/DF t))

The relevance increases the word appears more in the document, and it increases if the word is rare (rarely appears in the corpus of documents)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is BM25?

A

Improved version of TF-IDF

It contains two hyperparameters which have to be set by developers:
k1 - controls term frequency scaling
b - controls document length normalization (b=0 -> no length normalization, b=1 -> relative frequency, fully scale by document length)

TF_IDF increases with increasing term frequency while BM25 becomes constant much sooner (after word appears 12-15 times, the relevance shouldn’t increase)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Precision vs Recall

A

Precision: How many of the selected items are relevant: TP / TP + FP

Recall: How many of the total relevant items are selected?
TP / TP + FN

Recall can be very high if we increase the number of retrieved documents (if we retrieve all possible documents, we can recall of 1 because all relevant documents will be retrieved).

In other contrast, if we retrieve 1-2 documents where we are 99.999% sure are relevany, the precision will be high but if there are 200 other relevant documents, the recall will be low.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is F1 score?

A

In is a combination of recall and precision:

2 * (P * R) / (P + R).

It reduces the inconsistancies of precision and recall.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is MRR?

A

Mean Reciprocal Rank is a binary evaluation metric that only looks at the first relevany document, doesn;t care about the rest of the retrieved documents.

(T F T F F) -> RR = 1
(F F T F F) -> RR = 1/3
(F T T F F) -> RR = 1/2

MRR = (1 + 1/3 + 1/2) / 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is MAP?

A

Mean Average Precision is a binary evaluation metric that looks at all retrieved relevant documents. It is difficult to interpret:

Assume there are total of 2 relevant documents:
(T F T F F) -> AP = (1 + 2/3) / 2
(F F T F F) -> AP = (1/3 + 0) / 2
(F T T F F) -> AP = (1/2 + 1/3) / 2

MAP = mean of them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is nDCG

A

Problem with binary evaluation metrics: some documents might be more relevant that others. Labeling has to be done according to some rating: 0 = irrelevant, 3 = perfectly relevant

Assuming two relevant documents exist with relevance 3 and 1.
Ideal would be first to be retrieved 3 and then 1
Ideal DCG = 3 / log2(1 + 1) + 3 / log2(2 + 1) = 3.63

(1 0 3 0 0) -> nDCG = (1/log2(1+1) + 3/log2(3 + 1)) / idealDCG = 0.69

(0 0 1 0 0) -> nDCG = 1/log2(1 + 3) / idealDCG

(0 3 1 0 0) -> nDCG = 3/log2(1 + 2) + 1/log2(1 + 3) / idealDCG

How well did you know this?
1
Not at all
2
3
4
5
Perfectly