Week 5 Flashcards

Question 1

Q

Basic Idea of Vector Space Model

Answer

A

Relevance is same as similarity

- if a document is similar to the query it has higher relevance

Question 2

Q

Vector Space Model Framework

Answer

A

Represent document/query by term vector
Query vector q is query term weight
Document vector d is document term weight

Question 3

Q

How is relevance measured in VSM

Answer

A

based on similarity of two vectors

Question 4

Q

What is a bit vector representation

Answer

A

1 word is present, 0 word is absent

Question 5

Q

What is Similarity instantiation Dot Product?

Answer

A

Common method to measure similarity
Similarity between vector and query
Yields a score to rank

Question 6

Q

Dot Product example

Answer

A

Get all words in V list
create query vector with V words with 1
create document vector with words given 1/0 for each document
compute dot product

Question 7

Q

Problem of VSM

Answer

A

should terms have same weight
should some terms be more important
should term frequency be considered

Question 8

Q

Improved VSM instantiation

Answer

A

make specific term more important

- add in term frequency vector weighting

Question 9

Q

Improved VSM example

Answer

A

Rank with TF weighting

Get query in vector term frequency and each document term frequency then compute dot product

Question 10

Q

How to penalize popular terms in VSM

Answer

A

IDF weighting
function is smoothed by log IDF = log(M+1/k)
M = docs in collection
K = docs containing word

now compare dot product my multiplying word vector value or count with IDF * count

Question 11

Q

VSM with BM25

Answer

A

multiply vector by TF transformation of (K+1) c(w,d) / c(w,d) + k X IDF
ranking function = C(w,q) * Above

Question 12

Q

What is the impact of document length to TR

Answer

A

Penalize long document with long document normalizer

Question 13

Q

Why is a document longer

Answer

A

document has more words -> meaningless, need more penalty

document has more contents -> meaningful, need less penalty

Question 14

Q

How to normalize/penalize doc length

Answer

A

Use pivot length normalizer - Normalizer = 1 if document length is average
Otherwise Normalizer = 1 - b + b (document length/ average document length)

Week 5 Flashcards

(14 cards)