Week 4 Flashcards

1
Q

Boolean queries

A
  • process a query in the form of boolean expression
  • AND , OR , NOT to join terms
  • Simplest IR model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Boolean Query optimization

A
  • for each n terms, get postings then AND them together

- Execute smaller frequency first ( Calpurnia AND Brutus) AND Caesar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Boolean general optimization

A
  • OR AND
  • get document frequency for all terms
  • estimate size of each OR by summating document frequencies
    Process in increasing order of OR sizes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Biword Indices

A
  • Index every consecutive pair of terms in text as a phrase
  • Friends, Romans, Countrymen = Friends Romans + Romans Countrymen
  • each biword is a dictionary term
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Longer phrase queries, what are problems

A
  • Break them down
  • Wilfred laurier university waterloo -> Wilfred Laurier AND laurier university AND university waterloo
    Problems
  • issue is false positive
  • Index blow up due to big dictionary
  • not standard solution for TR
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Positional indices

A
  • In postings index table store Document frequency but also position in which token appears
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Parse phrase query in positional indices

A
  • Extract inverted index entries for each distinct terms; to, be, or, not
  • Merge document position list to enumerate all positions leading to “to be or not to be”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Positional index size

A
  • positional index expands postings storage substantially
  • positional index used as standard for TR due to use value of phrase and proximity queries
  • size depends on avg size of document
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What can positional size index be used for

A
  • can be used explicitly for implicitly ranking retrieval system
  • needs entry for all occurrence of the term, not just on per document
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Combination of biword and positional index

A
  • solution to large index size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly