Foundations Flashcards
Example of a Transactional Query
“Best Budget Headphones”
Informational Query
Query where the user does not know what they are looking for, but searches for reading material on the given topic.
Example of Informational Query
“Books on space”
Transactional Queries
Queries when we know what we are looking for and want to find places where we can get that.
Question Answering Queries
Queries where we ask a question we don’t know the answer to, and answers are short, verifiable, and specific.
3 Main uses of Search Engines
Personal Search: Finding documents within your own personal computer
Enterprise Search: Finding information within an outside server/organization firewall (e.g., library catalog)
Vertical Search: Finding information across public websites about a specific topic (e.g., Yelp)
Documents
Storage and retrieval unit of a search system. The system will always return documents or links to them.
Example of Documents in a system
In Gmail, search returns links to emails, so emails are considered documents.
Query
Expression of information need that the user has.
The input to a search engine.
Tokens
Token: Internal representation of a word in a document or query.
Tokenization: Process of converting words into tokens.
Terms
Vocabulary (V): Set of unique tokens across all documents.
Term: Each element in the vocabulary set.
Out-of-Vocabulary (OOV)
A token appearing in a query but not in any document. IR systems typically ignore OOV tokens.
Tokenization Benefits
Helps find documents when users type words differently.
Removes inconsistencies from documents.
Architecture of a Search Engine
Documents added to the index.
Users send queries and receive results.
Matching tokens in the query against terms in the index to find relevant documents.
Term-Document Incidence Matrix
Matrix indicating the importance of a term for a document. Common weight choices: Boolean, Counts, tf-idf.