Probabilistic Model Flashcards

1
Q

What is a sample space?

A

The set of possible outcomes of a random experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is an event?

A

A subset of the sample space. A collection of outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When is an event said to occur?

A

If the outcome of the random experiment is a member of the event set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is relevance determined by a probabilistic retrieval model?

A

Using the probability that a user who likes d would enter query q
Relevance(q,d) = p(q|d)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the assumption made with the probabilistic model?

A

A user is formulating their query based on an imaginary relevant document

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a statistical language model?

A

Represents a probability distribution over word sequences.
Ex: p(“Today is Wednesday) = 0.001 but p(Today Wednesday is) = 0.000000001

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a language model?

A

A probabilistic model that estimates the likelihood of a sequence of words based on patterns observed in training data. Higher probabilities are given to more likely word sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the unigram language model?

A

A language model that generates text one word at a time, with each word being chosen independently from a distribution of words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How is probability of a phrase generated using a unigram language model?

A

Multiply the probabilities of the individual words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is the probability of a word determined with a unigram language model?

A

Based on frequencies within a corpus of text that is relevant to the topic in question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the maximum likelihood estimator?

A

A method for estimating the probabilities of words in a Unigram LM
P(w|d) = (c(w,d))/|d|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the issue with the maximum likelihood estimator?

A

It doesn’t account for unseen words. These are words which may be relevant but do not appear in the doc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is topic modelling?

A

A technique used to identify the main themes or topics present in a collection of documents. Can be solved using language models. If you have a new document that frequently uses words with high probabilities under a certain topic, it probably belongs to that topic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is association analysis?

A

Determining which words are semantically related to others. It analyzes the probabilities and patterns of word occurences.
Uses the probability of a searched word to find words similarly occuring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can we use the maximum likelihood estimator on a multi-word query?

A

Get the maximum likelihood estimate for each word and multiply them together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the issue with using maximum likelihood estimator on a multi-word query?

A

If there is a query that doesn’t appear, the entire likelihood goes to 0

17
Q

How can we solve the issue with the maximum likelihood estimator?

A

Instead computing query likelihood which is how likely we are to observe a specific query form a doc model

18
Q

How do you compute query likelihood?

A

Multiply the probabilities of finding each word within the document and rank by highest likelihood

19
Q
A