Evaluation of IR Systems Flashcards

1
Q

How can you tell qualitatively if users are happy with your system?

A
  1. Search returns relevant results
  2. Search results get clicked a lot
  3. Users buy something after using the search
  4. You get repeat visitors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is relevance asssessed?

A

Relative to the user need not the query provided

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some reasons we evaluate our systems?

A
  1. To assess the actual utility of the retrieval system for users
  2. To compare different systems and methods
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What should be measured in an information retrieval system?

A
  1. Effectiveness/accuracy: how relevant are the search results
  2. Efficiency: How quickly can a user get results? How much resources are needed to answer the query
  3. Usability: How useful is the system for real user tasks?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is precision and recall?

A

Measures for assessing IR performance by looking at accuracy
Precision = TP/ (TP + FP)
Recall = TP / (TP + FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the precision/recall tradeoff?

A

High recall tends to be associated with low precision.
Increasing the number of docs retrieved will always lead to equal or higher recall so retrieving all would get us 100% recall with bad precision.
It is also easy to get high precision with low recall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the F-measure and what is the equation for the F-1 score?

A

Allows us to trade off precision and recall with a single measure
F1 = (2PR)/(P+R)

F = ((B^2+1)P*R)/(B^2P+R)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why are precision and recall metrics often meaningless and what can we do instead?

A

Meaningless because the metrics don’t take into account any context of the system’s use case.
Instead, it is more informative to compare the ranking of documents by each system.
It evaluates the relevance of documents retrieved as well as their order of retrieval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is average precision and how do we calculate it?

A

It is the standard measure for comparing two ranking methods for a single query.
Calculate the sum of precision values at each point where a relevant document was retrieved and divide by the number of relevant documents in the set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is mean average precision (MAP) and how do we calculate it?

A

Mean of average precision over a set of queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is discounted cumulative gain?

A

A method for evaluating information retrieval when there are multiple levels of relevancy. Gain measures how much relevant information a user can gain by looking at each document

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the 2 assumptions behind discounted cumulative gain?

A
  1. Highly relevant documents are more useful than marginally relevant documents
  2. The lower the ranked position of a relevant document, the less useful it is for the user since it is less likely to be examined
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do we calculate cumulative gain?

A

The sum of relevancy scores of documents retrieved where higher scores mean more relevance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do we calculate discounted cumulative gain?

A

Discounting each relevance score by something depending on its rank
Typical discount is 1/log(rank)

DCG = r1 + r2/log 2 + r3/log 3 + … rn/log n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the ideal discounted cumulative gain?

A

The DCG associated with the best possible ranking of the documents. Documents should be ranked such that high relevancy scores are at the top of the list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do we calculate the normalized discounted cumulative gain?

A

DCG/IdealDCG at the same point

17
Q

Why do we use normalized discounted cumulative gain?

A
  1. TO measure the total utility of the top k documents to a user
  2. Discount the utility of a lowly ranked document
  3. Ensures compatibility across queries with different numbers of relevant documents
18
Q

What is the issue with human judgements of the relevance of information retrieved?

A

Human judgements are expensive, inconsistent between raters and over time, and are not always representative of real users of a system

19
Q

What is the process for pooling to avoid judging all documents in a collection?

A
  1. Choose a diverse set of ranking methods
  2. Have each return the top k documents
  3. Combine all top k to form a pool for human assessors to judge
  4. Other documents are usually assumed to be non-relevant
  5. This is ok for comparing systems that contributed to the pool but is problematic for evaluating new systems