Boolean Retrieval Model Flashcards

1
Q

What is the boolean retrieval model?

A

A method of information retrieval that can answer any query that is a boolean expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a boolean query?

A

Queries using AND, OR, and NOT to join query terms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a limitation of the boolean retrieval model?

A

Only records if a document matches a condition or not, no additional data such as frequency or proximity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a phrase query?

A

Strings of multiple tokens that are meant to be used together.
Ex: “University of Toronto”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why can’t the classic boolean model perform phrase queries?

A

It doesn’t record information about proximity of terms to one another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a biword index?

A

Indexing every consecutive pair of terms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the benefits of biword indexes?

A
  1. Two word phrase query processing is immediate
  2. Can search for longer phrases with some proximity info
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can longer phrase queries be processed with a biword index?

A

Breaking down the query into a list of biwords. Perform separate query on each biword and conjunct results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the downsides of a biword index?

A
  1. False positives as we cannot verify the whole contiguous string appears
  2. Index takes more storage due to the bigger dictionary
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a positional index?

A

In the postings table, store document frequency as well as positions in which the token appears

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you process a phrase query using a positional index?

A
  1. Extract index entries for each term
  2. Merge their doc:position lists to enumerate all positions between words
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why can’t biword indexes be used for longer phrase queries?

A

The issue of false positives. It only checks if pairs of words are present beside each other not the contiguous series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the drawback of a positional index and why is it used regardless?

A

It expands postings storage substantially as we store every occurrence of each term. We use it anyway because it brings value through phrase and proximity queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the rules of thumb about positional index size?

A
  1. A positional index is 2-4 times as large as a non-positional index
  2. A positional index is 35-50% of the volume of the original text
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can we combine the biword and positional index?

A

Store the positions of each biword combination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the size of a biword positional index?

A

About 26% bigger than a positional index alone