W7 Generative Language Models Flashcards

1
Q

what is a language model?

A

a simplified statistical model of text
- data driven as opposed to rule-based
- local context predicts the following words
- can be used to compute the probability of observing a sentence given a model of a language (fragment) as opposed to syntactical wellformedness of that string

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

language model application in IR

A

each document is represented by a language model
rank documents according to P(D|Q) = P(Q|D) P(D) / P(Q)
simple model with memory = 0 (terms are chosen independently) works surprisingly well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

query-likelihood model

A

rank documents by their probability that the query could be generated by the document model

RSV(Q,D) = pi(i=1 to n) P(q_i|D)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

why do we apply smoothing?

A

document texts are a sample from the language model => missing words should not have zero probability of occuring

smoothing: technique for estimating probabilities for missing words
- lower (or discount) the probability estimates for words that are seen in the document text
- assign that “left-over” probability to the estimates for the words that are not seen in the text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the problem with discounting probability estimates?

A

all unseen terms are assigned an equal probability

new estimate for unseen terms: lambda * P(q_i|C)
this is the background probability: the probability for query word i in the collection language model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

JM smoothing

A

P(q1…qn) = pi(j=1 to n) (1 − 𝜆) 𝑃(𝑞_𝑗|𝐷) + 𝜆𝑃(𝑞_𝑗|𝐶)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Dirichlet smoothing

A

p(q_i|D) = freq_qi,D / |D| + mu / (|D|+ mu)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

CLIR

A

Cross Language Information Retrieval: query and document are written in different languages => language models are instances of different feature spaces

solution:
1. translate documents or query
2. map languag models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

relevance model

A

a language model representing information need
1. first pass ranking
2. estimate relevance model from query and top-ranked documents
3. (re)rank documents by similarity of document model to relevance model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly