Boolean Information Retrieval Flashcards
Boolean Query
Definition: Boolean queries involve terms in logical expressions to evaluate documents.
Info: Each term functions as a logical predicate, and answers are sets where documents satisfy or don’t satisfy the expression.
Parsing in Boolean Queries
Definition: Parsing determines the order of sub-expressions while preserving the query’s meaning.
Info: Strategies include parsing into an Abstract Syntax Tree (AST) or rewriting expressions.
Complexity of Answering Queries
Assumptions: Term-document matrix fits in memory, query expression has tokens and operators.
Info: Lookup cost is O(x log N), solving operators is O(yM), assuming a sorted term dictionary.
The Inverted Index
Definition: Utilizes the sparsity of the term-document matrix, holding only ones in linked lists.
Info: Involves tokenizing, generating postings, sorting, and creating the dictionary. Efficient for answering various queries.
Optimizing Conjunctive Queries
Ordering of Terms: Matters for speed; best ordering is fewer postings first.
Document Frequency (DF): Number of documents containing a term, computed during inverted index creation.
Skip Pointers: Reduce query time with spaced pointers for linked list traversal.
Trade-Offs: Balance between pointer count and jump size for effective optimization.
Phrase Queries
Definition: A special case of conjunctive Boolean queries with constraints on term positions.
Challenge: Neither term-document matrix nor inverted index tracks term positions, posing a challenge for phrase queries.