Boolean Information Retrieval Flashcards

1
Q

Boolean Query

A

Definition: Boolean queries involve terms in logical expressions to evaluate documents.
Info: Each term functions as a logical predicate, and answers are sets where documents satisfy or don’t satisfy the expression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Parsing in Boolean Queries

A

Definition: Parsing determines the order of sub-expressions while preserving the query’s meaning.
Info: Strategies include parsing into an Abstract Syntax Tree (AST) or rewriting expressions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Complexity of Answering Queries

A

Assumptions: Term-document matrix fits in memory, query expression has tokens and operators.
Info: Lookup cost is O(x log N), solving operators is O(yM), assuming a sorted term dictionary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The Inverted Index

A

Definition: Utilizes the sparsity of the term-document matrix, holding only ones in linked lists.
Info: Involves tokenizing, generating postings, sorting, and creating the dictionary. Efficient for answering various queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Optimizing Conjunctive Queries

A

Ordering of Terms: Matters for speed; best ordering is fewer postings first.
Document Frequency (DF): Number of documents containing a term, computed during inverted index creation.
Skip Pointers: Reduce query time with spaced pointers for linked list traversal.
Trade-Offs: Balance between pointer count and jump size for effective optimization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Phrase Queries

A

Definition: A special case of conjunctive Boolean queries with constraints on term positions.
Challenge: Neither term-document matrix nor inverted index tracks term positions, posing a challenge for phrase queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly