07 Term Scoring Flashcards

1
Q

Ranked retrieval

What ate problems with boolean retrieval ?

A
  • most users are not capable of writing boolean quries
  • users don’t want huge numbers of result
  • boolean queries hit too few or too many
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Ranked retrieval

What are the main ideas of scoring in ranked retrieval ?

A
  • assign score to each query-document pair
  • measures how well document and query match
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Ranked retrieval

What is the query-document match score that the Jaccard
coefficient
computes for:
Query: “ides of March”
Document: “Caesar died in March”

A

jaccard(q,d) = 1/6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Ranked retrieval

What’s wrong with Jaccard ?

A
  • no weighting
  • rare terms are more informative thn frequent terms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Term frequency

What is tf weight (log frequency weight) of the following term frequencies:
a) tf = 1, b) tf=10, c) tf = 1000

tf(t,d) = number of times that t occurs in d

A

a) 1
b) 2
c) 4

we need log frequency in stead of raw (term) frequency
because relevance does not increase proportionally with term frequency

if tf > 0: 1+log10(tf), otherwise: 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Document frequency

compute idf weight for the following document frequencies:
a) df = 1, b) df = 100, c) df = 1000 (given N = 1,000,000)

df(t) = number of d (in the whole collection) that t occurs in

A

a) 6 (most relevant)
b) 4
c) 3

idf = log10 * (N/df)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we calculate tf-idf weighting ?

A

tf multiply by idf

tf-idf is the best known weighting scheme in IR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Vector and space model

What are the key ideas of query as vectors ?

A
  • represent queries as vector (= the same for documents)
  • rank documents according to their proximity (=similarity) to the query

proximity = negative distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why Euclidean distance is not good for normalizing vectors ?

A

it is large for vectors of different lengths

query is a very short vector,we use angle instead of distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly