Evaluation of IR Systems Flashcards
How can you tell qualitatively if users are happy with your system?
- Search returns relevant results
- Search results get clicked a lot
- Users buy something after using the search
- You get repeat visitors
How is relevance asssessed?
Relative to the user need not the query provided
What are some reasons we evaluate our systems?
- To assess the actual utility of the retrieval system for users
- To compare different systems and methods
What should be measured in an information retrieval system?
- Effectiveness/accuracy: how relevant are the search results
- Efficiency: How quickly can a user get results? How much resources are needed to answer the query
- Usability: How useful is the system for real user tasks?
What is precision and recall?
Measures for assessing IR performance by looking at accuracy
Precision = TP/ (TP + FP)
Recall = TP / (TP + FN)
What is the precision/recall tradeoff?
High recall tends to be associated with low precision.
Increasing the number of docs retrieved will always lead to equal or higher recall so retrieving all would get us 100% recall with bad precision.
It is also easy to get high precision with low recall
What is the F-measure and what is the equation for the F-1 score?
Allows us to trade off precision and recall with a single measure
F1 = (2PR)/(P+R)
F = ((B^2+1)P*R)/(B^2P+R)
Why are precision and recall metrics often meaningless and what can we do instead?
Meaningless because the metrics don’t take into account any context of the system’s use case.
Instead, it is more informative to compare the ranking of documents by each system.
It evaluates the relevance of documents retrieved as well as their order of retrieval
What is average precision and how do we calculate it?
It is the standard measure for comparing two ranking methods for a single query.
Calculate the sum of precision values at each point where a relevant document was retrieved and divide by the number of relevant documents in the set
What is mean average precision (MAP) and how do we calculate it?
Mean of average precision over a set of queries
What is discounted cumulative gain?
A method for evaluating information retrieval when there are multiple levels of relevancy. Gain measures how much relevant information a user can gain by looking at each document
What are the 2 assumptions behind discounted cumulative gain?
- Highly relevant documents are more useful than marginally relevant documents
- The lower the ranked position of a relevant document, the less useful it is for the user since it is less likely to be examined
How do we calculate cumulative gain?
The sum of relevancy scores of documents retrieved where higher scores mean more relevance
How do we calculate discounted cumulative gain?
Discounting each relevance score by something depending on its rank
Typical discount is 1/log(rank)
DCG = r1 + r2/log 2 + r3/log 3 + … rn/log n
What is the ideal discounted cumulative gain?
The DCG associated with the best possible ranking of the documents. Documents should be ranked such that high relevancy scores are at the top of the list