4. Information Retrieval & Relational Databases Flashcards
What is Information Retrieval?
Finding documents that are about a given topic
What is a corpus?
An organised repository, or collection, of data
What is relevance?
Whether a retrieved document is actually about the requested topic
What is precision?
The probability that a document is relevant given that it is retrieved
i.e Number of relevant documents retrieved / total number of documents retrieved
What is recall?
The probability that a document is retrieved given that is is relevant
i.e Number of relevant documents retrieved / total number of relevant documents
How would we compare IR algorithms? (2)
- Average precision, which attempts to combine trade-off between precision and recall into a single value
- Effectiveness measure, which combines precision and recall into a single value
What is Average Precision? (2)
- Reflects the recognition that precision varies, expressing the variation as a graph (curve) of precision vs. recall
- Attempts to summarize curve as single value for comparison
What is Effectiveness Measure algorithm?
1 - 1 / a(1/P) + (1-a)(1/R)
a: alpha, 0=precision isn’t important, 1=recall isn’t important
P: Precision
R: Recall