Week 4 Flashcards
1
Q
Boolean queries
A
- process a query in the form of boolean expression
- AND , OR , NOT to join terms
- Simplest IR model
2
Q
Boolean Query optimization
A
- for each n terms, get postings then AND them together
- Execute smaller frequency first ( Calpurnia AND Brutus) AND Caesar
3
Q
Boolean general optimization
A
- OR AND
- get document frequency for all terms
- estimate size of each OR by summating document frequencies
Process in increasing order of OR sizes
4
Q
Biword Indices
A
- Index every consecutive pair of terms in text as a phrase
- Friends, Romans, Countrymen = Friends Romans + Romans Countrymen
- each biword is a dictionary term
5
Q
Longer phrase queries, what are problems
A
- Break them down
- Wilfred laurier university waterloo -> Wilfred Laurier AND laurier university AND university waterloo
Problems - issue is false positive
- Index blow up due to big dictionary
- not standard solution for TR
6
Q
Positional indices
A
- In postings index table store Document frequency but also position in which token appears
7
Q
Parse phrase query in positional indices
A
- Extract inverted index entries for each distinct terms; to, be, or, not
- Merge document position list to enumerate all positions leading to “to be or not to be”
8
Q
Positional index size
A
- positional index expands postings storage substantially
- positional index used as standard for TR due to use value of phrase and proximity queries
- size depends on avg size of document
9
Q
What can positional size index be used for
A
- can be used explicitly for implicitly ranking retrieval system
- needs entry for all occurrence of the term, not just on per document
10
Q
Combination of biword and positional index
A
- solution to large index size