07 Term Scoring Flashcards

Question 1

Q

Ranked retrieval

What ate problems with boolean retrieval ?

Answer

A

most users are not capable of writing boolean quries
users don’t want huge numbers of result
boolean queries hit too few or too many

Question 2

Q

Ranked retrieval

What are the main ideas of scoring in ranked retrieval ?

Answer

A

assign score to each query-document pair
measures how well document and query match

Question 3

Q

Ranked retrieval

What is the query-document match score that the Jaccard
coefficient computes for:
Query: “ides of March”
Document: “Caesar died in March”

Answer

A

jaccard(q,d) = 1/6

Question 4

Q

Ranked retrieval

What’s wrong with Jaccard ?

Answer

A

no weighting
rare terms are more informative thn frequent terms

Question 5

Q

Term frequency

What is tf weight (log frequency weight) of the following term frequencies:
a) tf = 1, b) tf=10, c) tf = 1000

tf(t,d) = number of times that t occurs in d

Answer

A

a) 1
b) 2
c) 4

we need log frequency in stead of raw (term) frequency
because relevance does not increase proportionally with term frequency

if tf > 0: 1+log10(tf), otherwise: 0

Question 6

Q

Document frequency

compute idf weight for the following document frequencies:
a) df = 1, b) df = 100, c) df = 1000 (given N = 1,000,000)

df(t) = number of d (in the whole collection) that t occurs in

Answer

A

a) 6 (most relevant)
b) 4
c) 3

idf = log10 * (N/df)

Question 7

Q

How do we calculate tf-idf weighting ?

Answer

A

tf multiply by idf

tf-idf is the best known weighting scheme in IR

Question 8

Q

Vector and space model

What are the key ideas of query as vectors ?

Answer

A

represent queries as vector (= the same for documents)
rank documents according to their proximity (=similarity) to the query

proximity = negative distance

Question 9

Q

Why Euclidean distance is not good for normalizing vectors ?

Answer

A

it is large for vectors of different lengths

query is a very short vector,we use angle instead of distance

07 Term Scoring Flashcards

(9 cards)