02 Introduction to retrieval system Flashcards
what is information retrieval
science of search engine
- effectively get the right information to user
- efficiently get information to user
- relevance to see if the query and document is about the same topic
general definition of IR
retrieval of relevant information from data sources which were not originally intended for access
documents vs database records**
database records store in tuples, but how to match with textual records
retrieving:
- structured data (DB)
- free text (IR)
queries
- formally defined (DB)
- vague, imprecise (IR)
results
- exact, always correct (DB)
- sometime relevant (IR)
imprecision in IR
most algorithms in computer science have ‘right’ answers
IR techniques are essentially heuristic as we do not know the right answers
users classification
naive to expert
- professionals
- ‘accommodation in glasgow’
- direct term search - general users
- how do i get to … - laymen
- doctor said i have… where do i get more information
- long texts
information pyramid
data
information = data in context
knowledge = basis for making decisions
wisdom
what makes a document relevant
- does it contain all query terms
- contain many times
- fresh
- authoritative (have many links)
- doesnt contain too many ads
- doesnt contain spam
- has been clicked the most
search engines
web search process is not a one search approach
- question answer assessment
- user engagement