MLE/MAP Flashcards

1
Q

What’s the difference between pmf and pdf?

A

pmf applies to discrete.

pdf applies to continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are two advantages of MAP?

A
  • It is easy to incorporate our prior assumptions about the value of q by adjusting the ratio of gamma1 to gamma0.
  • It is easy to express our degree of certainty about our prior knowledge, by adjusting the total volume of imaginary coin flips. For example, if we are highly certain of our prior belief that q = 0.7, then we might use priors of gamma1 = 700 and gamma0 = 300 instead of gamma1 = 7 and gamma0 = 3.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s one key idea about the behavior of MAP?

A

As the volume of actual observed data grows toward infinity, the influence of our imaginary data goes to zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s the equation of map?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What’s the equation of MLE?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What’s a key difference between MAP and MLE?

A
  • MAP assumes background knowledge is available, whereas MLE does not.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

For MLE and MAP, what happens as the size of the dataset grows?

A

The MLE estimate and MAP estimate converge toward each other and toward the correct estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Between MLE and MAP, which should perform better when there’s little data available?

A

MAP because the influence of our assumptions from previous knowledge has a larger impact when the data set is small

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What’s the definition of MLE?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define prior

A

The prior, of an uncertain quantity is the probability distribution that would express one’s beliefs about this quantity before some evidence is taken into account.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is iid?

A

independent, identically distributed

It’s a Combination of two assumptions:

  • the outcomes of different trials are independent
  • identically distributed - the same distribution governs each trial
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is likelihood?

A

L(q) = P(D|theta)

The probability of a data set given the probability parameters (theta)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What’s the equation for log likelihood?

A

l(theta) = ln (P(D|theta))

Can express P(D|theta) as a product of the individual probabilities, because P(D|theta) is a probability distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is there a log in the log likelihood?

A

Having a log there just makes it mathematically easier to maximize the likelihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is MLE maximizing? (simple expression)

A

P(D|theta)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is MAP maximizing? (simple expression) (2)

A

P(theta|D) or P(D|theta)P(theta)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a bernoulli random variable?

A
  • We have exactly one trial only
  • we define “success” as a 1 and “failure” as a 0.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the conjugate prior for estimating
the parameter theta of a Bernoulli distribution?

A

Beta distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What do we know about the Dirichlet distribution?

A

The Dirichlet distribution is a generalization of the Beta distribution,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What’s the Principle of MLE? (in words)

A
21
Q

How would you describe the inductive bias of MLE?

A

MLE tries to allocate as much probability mass as possible to the things we have observed… …at the expense of the things we have not observed

22
Q

What allows us to use the product of probabilities in the log likelihood equation

A

the independence assumption

23
Q

How do you calculate the closed-form MLE? (5 steps)

A
24
Q

What’s log(A*B*C)?

A

log(A) + log(B)+log(C)

25
Q

What’s a trick for rewriting (probability distributions?) of binary variables? When is it useful?

A

Useful when calculating log likelihood, among other uses

26
Q

What’s one difference between MLE and MAP?

A

for MAP we’re reasoning about p(theta|D), wheras with MLE we’re reasoning about p(D|theta)

27
Q

In MAP, what form will the prior take? Why?

A
  • A pdf because the parameters theta are (usually?) continuous
28
Q

What is a posterior?

A
  • aka posterior probability
  • It’s the new probability you believe about a quantity after you see new test evidence
  • Formulated by combining your prior probability (i.e. prior) and new test evidence
    *
29
Q

What is the equation for MAP with each part labeled?

A
30
Q

Why can we drop P(D) from the map equation?

A

because P(D) won’t effect the argmax of the expression

31
Q

What’s important about beta distributions?

A

It’s a conjugate prior for Bernoulli likelihood model

32
Q

What is one alternative to ML’s view of ML?

A

Viewing ML as trying to accomplish function approximation is an alternative view

33
Q

What does Naive bayes give us?

A

A closed form solution for MLE and MAP?

34
Q

What’s the problem with the Naive Bayes assumption?

A

The features might not actually be independent

35
Q

Define conditional independence

A

X and Y are conditionally independent given Z iff p(x,y|z)=p(x|z)*p(y|z)

36
Q

If we need to think of variations of the MLE/MAP equation, what should you use?

A

The conditional probability equation

37
Q

What are the most basic equations for:

  1. MLE
  2. MAP
A
38
Q

What’s a shortcoming of MLE? (slide 27 of lecture 17)

A
39
Q

What’s the recipe for closed form MAP estimation?

A
40
Q

What’s the expression for MLE? (2)

A
41
Q

What’s the expression for MAP? (4)

A
44
Q

What is the expression for the posterior?

A

p(θ|D)

45
Q

What is the expression for the prior?

A

p(θ)

46
Q

Compare MAP and MLE

A
  • MAP has more bias
  • MLE has more variance
47
Q
  • When should you use MAP?
  • When should you use MLE?
A
  • Use MAP when the prior is good enough (i.e. you have reason to believe the case you’re working with is similar to something you’ve seen previously)
  • Otherwise use MLE
48
Q

What do you use to rewrite a conditional probability of MAP using bayes rule?

A

Only need the conditional probability equation

49
Q

What does a prior relate to?

A

An uncertain quantity/distribution

50
Q

What is the equation for bayes theorem, with each term labeled?

A
  • Posterior = Likelihood * Prior / Evidence
51
Q

In words, what is the difference between MLE and MAP?

A

MLE gives you the value which maximises the Likelihood P(D|θ). And MAP gives you the value which maximises the posterior probability P(θ|D).