Week 5 Flashcards

1
Q

Basic Idea of Vector Space Model

A
  • Relevance is same as similarity

- if a document is similar to the query it has higher relevance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Vector Space Model Framework

A
  • Represent document/query by term vector
    Query vector q is query term weight
    Document vector d is document term weight
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is relevance measured in VSM

A

based on similarity of two vectors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a bit vector representation

A

1 word is present, 0 word is absent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Similarity instantiation Dot Product?

A
  • Common method to measure similarity
  • Similarity between vector and query
  • Yields a score to rank
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Dot Product example

A

Get all words in V list
create query vector with V words with 1
create document vector with words given 1/0 for each document
compute dot product

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Problem of VSM

A
  • should terms have same weight
  • should some terms be more important
  • should term frequency be considered
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Improved VSM instantiation

A
  • make specific term more important

- add in term frequency vector weighting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Improved VSM example

A

Rank with TF weighting

Get query in vector term frequency and each document term frequency then compute dot product

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to penalize popular terms in VSM

A
IDF weighting
function is smoothed by log IDF = log(M+1/k)
M = docs in collection
K = docs containing word
  • now compare dot product my multiplying word vector value or count with IDF * count
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

VSM with BM25

A

multiply vector by TF transformation of (K+1) c(w,d) / c(w,d) + k X IDF
ranking function = C(w,q) * Above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the impact of document length to TR

A
  • Penalize long document with long document normalizer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is a document longer

A

document has more words -> meaningless, need more penalty

document has more contents -> meaningful, need less penalty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to normalize/penalize doc length

A

Use pivot length normalizer - Normalizer = 1 if document length is average
Otherwise Normalizer = 1 - b + b (document length/ average document length)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly