10. Bayesian Statistics Flashcards

1
Q

Bayes’ Theorem

Elementary Version

A

P(A|B) = P(A∩B)/P(B)

= P(B|A)P(B) / [P(B|A)P(A)+P(B|A^c)P(A^c)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Bayes’ Theorem

Events as Discrete Random Variables

A

P(X=x|Y=y) = P(Y=y|X=x)P(X=x) / [ΣP(Y=y|X=t)P(X=t)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Bayes’ Theorem

Events as Continuous Random Variables

A

f(x|y) = f(y|x)f(x) / [∫f(y|t)f(t)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Frequentist vs. Bayesian Approach

A
  • to the frequentist, probability is long-run relative frequence
  • to the Bayesian, probability is a degree subjective belief
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Statistical Inference

A
  • adopt a probability model for data X, distribution of X depends on parameter θ
  • use observed value X=x to make decisions about θ
  • translate the decision into a statement about the process that generated the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Parameter Definition

Frequentist vs. Bayesian

A
  • a frequentist defines a parameter as an unknown constant

- a Bayesian defines a parameter as a random variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Model

Frequentist vs. Bayesian

A
  • frequentist: f(x)

- Bayesian: f(x|θ) OR p(x|θ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Bayesian Models

A
  • choose a prior distribution to describe the uncertainty in the parameter: π(θ)
  • observe data
  • use Bayes’ Theorem to obtain a posterior distribution, π(θ|x)
  • this posterior distribution could be used as a prior for the next experiment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Influence of the Prior

A
  • the most frequent objection to Bayesian statistics is the subjectivity of the choice of prior
  • however for robust data where the model is very good, the influence of the choice of prior should tend to 0 as the sample size increases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you determine the posterior distribution?

A

π(θ|x) = f(x|θ)π(θ) / [∫f(x|t)π(t)dt]

∝ f(x|θ)π(θ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What can you do with a posterior distribution?

A
  • give a point estimate of θ

- test hypotheses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Decision Theory

A

-anytime you make a decision you can lose something
-risk = expected loss
-the goal is to make decisions that will minimise risk
d = d(x) = ∈ D
-where d(x) is a decision based on the data and D is the decision space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Decision Space

A
  • the set of all possible decisions that might be made based on the data
  • for estimation, D = parameter space
  • for hypothesis testing, D = two points
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Loss Function

A

L = L(d(x), θ) ≥ 0

-when X and θ are random, L is a real-valued random variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Expected Loss

A

E(L) = E(E(L|X))

= ∫ [∫ L(d(x), θ) dπ(θ)] dP(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Bayes’ Decision

A
  • any decision d(x) that minimises the posterior expected loss for all x
  • meaning that it also minimises the overall expected loss (i.e. risk)
  • this is the theoretical basis for using the posterior distribution
17
Q

Prior Distribution

Beta Distribution

A

π(θ) = Γ(α+β)/[Γ(α)Γ(β)] * θ^(α-1) * (1-θ)^(β-1)

-for 00, β>0`

18
Q

Properties of the Beta Distribution

A
-defined on [0,1]
E(θ) = α/α+β
Var(θ) = αβ / (α+β)²(α+β+1)
-for α=β=1, the distribution is uniform
-can assume a variety of shapes depending on α and β
19
Q

Conjugate Priors

A
  • for some model and prior combinations, the prior and posterior distributions will be from the same family
  • e.g. the beta distribution is a conjugate prior for a Bernoulli model
  • conjugate priors are very convenient and exist for many models
20
Q

Loss Function

Squared Error Loss

A

-any different functions can be taken for the loss function as long as they satisfy the property that more wrong = greater loss
-e.g. the squared error:
L(d,θ) = k (d-θ)²
-we can drop the proportionality constant

21
Q

Minimise Expected Loss

A

-let μ = E(θ|X=x)
-then:
E(L(d,θ) | X=x)
= (d-μ)² + Var(θ|X=x)
-this is minimised when d=μ = E(θ|X=x)
-i.e. Bayes’ estimate under squared error loss is the posterior mean