independent, identically distributed It's a Combination of two assumptions: the outcomes of different trials are independent identically distributed - the same distribution governs each trial

MLE/MAP Flashcards by Dan Hoskins

What’s the difference between pmf and pdf?

pmf applies to discrete.

pdf applies to continuous

How well did you know this?

Not at all

Perfectly

What are two advantages of MAP?

It is easy to incorporate our prior assumptions about the value of q by adjusting the ratio of gamma1 to gamma0.
It is easy to express our degree of certainty about our prior knowledge, by adjusting the total volume of imaginary coin flips. For example, if we are highly certain of our prior belief that q = 0.7, then we might use priors of gamma1 = 700 and gamma0 = 300 instead of gamma1 = 7 and gamma0 = 3.

How well did you know this?

Not at all

Perfectly

What’s one key idea about the behavior of MAP?

As the volume of actual observed data grows toward infinity, the influence of our imaginary data goes to zero

How well did you know this?

Not at all

Perfectly

What’s the equation of map?

How well did you know this?

Not at all

Perfectly

What’s the equation of MLE?

How well did you know this?

Not at all

Perfectly

What’s a key difference between MAP and MLE?

MAP assumes background knowledge is available, whereas MLE does not.

How well did you know this?

Not at all

Perfectly

For MLE and MAP, what happens as the size of the dataset grows?

The MLE estimate and MAP estimate converge toward each other and toward the correct estimate

How well did you know this?

Not at all

Perfectly

Between MLE and MAP, which should perform better when there’s little data available?

MAP because the influence of our assumptions from previous knowledge has a larger impact when the data set is small

How well did you know this?

Not at all

Perfectly

What’s the definition of MLE?

How well did you know this?

Not at all

Perfectly

Define prior

The prior, of an uncertain quantity is the probability distribution that would express one’s beliefs about this quantity before some evidence is taken into account.

How well did you know this?

Not at all

Perfectly

What is iid?

independent, identically distributed

It’s a Combination of two assumptions:

the outcomes of different trials are independent
identically distributed - the same distribution governs each trial

How well did you know this?

Not at all

Perfectly

What is likelihood?

L(q) = P(D|theta)

The probability of a data set given the probability parameters (theta)

How well did you know this?

Not at all

Perfectly

What’s the equation for log likelihood?

l(theta) = ln (P(D|theta))

Can express P(D|theta) as a product of the individual probabilities, because P(D|theta) is a probability distribution

How well did you know this?

Not at all

Perfectly

Why is there a log in the log likelihood?

Having a log there just makes it mathematically easier to maximize the likelihood

How well did you know this?

Not at all

Perfectly

What is MLE maximizing? (simple expression)

P(D|theta)

How well did you know this?

Not at all

Perfectly

What is MAP maximizing? (simple expression) (2)

P(theta|D) or P(D|theta)P(theta)

How well did you know this?

Not at all

Perfectly

What is a bernoulli random variable?

We have exactly one trial only
we define “success” as a 1 and “failure” as a 0.

How well did you know this?

Not at all

Perfectly

What is the conjugate prior for estimating
the parameter theta of a Bernoulli distribution?

Beta distribution

How well did you know this?

Not at all

Perfectly

What do we know about the Dirichlet distribution?

The Dirichlet distribution is a generalization of the Beta distribution,

How well did you know this?

Not at all

Perfectly

What’s the Principle of MLE? (in words)

Study These Flashcards

How would you describe the inductive bias of MLE?

Study These Flashcards

MLE tries to allocate as much probability mass as possible to the things we have observed… …at the expense of the things we have not observed

What allows us to use the product of probabilities in the log likelihood equation

Study These Flashcards

the independence assumption

How do you calculate the closed-form MLE? (5 steps)

Study These Flashcards

What’s log(A*B*C)?

Study These Flashcards

log(A) + log(B)+log(C)

What's a trick for rewriting (probability distributions?) of binary variables? When is it useful?

Useful when calculating log likelihood, among other uses

What's one difference between MLE and MAP?

for MAP we're reasoning about p(theta|D), wheras with MLE we're reasoning about p(D|theta)

In MAP, what form will the prior take? Why?

* A pdf because the parameters theta are (usually?) continuous

What is a posterior?

* aka posterior probability * It's the new probability you believe about a quantity after you see new test evidence * Formulated by combining your prior probability (i.e. prior) and new test evidence *

What is the equation for MAP with each part labeled?

Why can we drop P(D) from the map equation?

because P(D) won't effect the argmax of the expression

What's important about beta distributions?

It's a conjugate prior for Bernoulli likelihood model

What is one **alternative** to ML's view of ML?

Viewing ML as trying to accomplish function approximation is an alternative view

What does Naive bayes give us?

A closed form solution for MLE and MAP?

What's the problem with the Naive Bayes assumption?

The features might not actually be independent

Define conditional independence

X and Y are conditionally independent given Z iff p(x,y|z)=p(x|z)\*p(y|z)

If we need to think of variations of the MLE/MAP equation, what should you use?

The conditional probability equation

What are the most basic equations for: 1. MLE 2. MAP

What's a shortcoming of MLE? (slide 27 of lecture 17)

What's the recipe for closed form MAP estimation?

What's the expression for MLE? (2)

What's the expression for MAP? (4)

What is the expression for the posterior?

p(θ|D)

What is the expression for the prior?

p(θ)

Compare MAP and MLE

* MAP has more bias * MLE has more variance

* When should you use MAP? * When should you use MLE?

* Use MAP when the prior is good enough (i.e. you have reason to believe the case you're working with is similar to something you've seen previously) * Otherwise use MLE

What do you use to rewrite a conditional probability of MAP using bayes rule?

Only need the conditional probability equation

What does a prior relate to?

An uncertain quantity/distribution

What is the equation for bayes theorem, with each term labeled?

* Posterior = Likelihood \* Prior / Evidence

In words, what is the difference between MLE and MAP?

MLE gives you the value which maximises the Likelihood P(D|θ). And MAP gives you the value which maximises the posterior probability P(θ|D).

MLE/MAP Flashcards

(49 cards)