Probabilistic Model Flashcards

Question 1

Q

What is a sample space?

Answer

A

The set of possible outcomes of a random experiment

Question 2

Q

What is an event?

Answer

A

A subset of the sample space. A collection of outcomes

Question 3

Q

When is an event said to occur?

Answer

A

If the outcome of the random experiment is a member of the event set

Question 4

Q

How is relevance determined by a probabilistic retrieval model?

Answer

A

Using the probability that a user who likes d would enter query q
Relevance(q,d) = p(q|d)

Question 5

Q

What is the assumption made with the probabilistic model?

Answer

A

A user is formulating their query based on an imaginary relevant document

Question 6

Q

What is a statistical language model?

Answer

A

Represents a probability distribution over word sequences.
Ex: p(“Today is Wednesday) = 0.001 but p(Today Wednesday is) = 0.000000001

Question 7

Q

What is a language model?

Answer

A

A probabilistic model that estimates the likelihood of a sequence of words based on patterns observed in training data. Higher probabilities are given to more likely word sequences

Question 8

Q

What is the unigram language model?

Answer

A

A language model that generates text one word at a time, with each word being chosen independently from a distribution of words

Question 9

Q

How is probability of a phrase generated using a unigram language model?

Answer

A

Multiply the probabilities of the individual words

Question 10

Q

How is the probability of a word determined with a unigram language model?

Answer

A

Based on frequencies within a corpus of text that is relevant to the topic in question

Question 11

Q

What is the maximum likelihood estimator?

Answer

A

A method for estimating the probabilities of words in a Unigram LM
P(w|d) = (c(w,d))/|d|

Question 12

Q

What is the issue with the maximum likelihood estimator?

Answer

A

It doesn’t account for unseen words. These are words which may be relevant but do not appear in the doc

Question 13

Q

What is topic modelling?

Answer

A

A technique used to identify the main themes or topics present in a collection of documents. Can be solved using language models. If you have a new document that frequently uses words with high probabilities under a certain topic, it probably belongs to that topic

Question 14

Q

What is association analysis?

Answer

A

Determining which words are semantically related to others. It analyzes the probabilities and patterns of word occurences.
Uses the probability of a searched word to find words similarly occuring

Question 15

Q

How can we use the maximum likelihood estimator on a multi-word query?

Answer

A

Get the maximum likelihood estimate for each word and multiply them together

Question 16

Q

What is the issue with using maximum likelihood estimator on a multi-word query?

Answer

A

If there is a query that doesn’t appear, the entire likelihood goes to 0

Question 17

Q

How can we solve the issue with the maximum likelihood estimator?

Answer

A

Instead computing query likelihood which is how likely we are to observe a specific query form a doc model

Question 18

Q

How do you compute query likelihood?

Answer

A

Multiply the probabilities of finding each word within the document and rank by highest likelihood

Question 19

Q