10. Bayesian Statistics Flashcards
Bayes’ Theorem
Elementary Version
P(A|B) = P(A∩B)/P(B)
= P(B|A)P(B) / [P(B|A)P(A)+P(B|A^c)P(A^c)]
Bayes’ Theorem
Events as Discrete Random Variables
P(X=x|Y=y) = P(Y=y|X=x)P(X=x) / [ΣP(Y=y|X=t)P(X=t)]
Bayes’ Theorem
Events as Continuous Random Variables
f(x|y) = f(y|x)f(x) / [∫f(y|t)f(t)]
Frequentist vs. Bayesian Approach
- to the frequentist, probability is long-run relative frequence
- to the Bayesian, probability is a degree subjective belief
Statistical Inference
- adopt a probability model for data X, distribution of X depends on parameter θ
- use observed value X=x to make decisions about θ
- translate the decision into a statement about the process that generated the data
Parameter Definition
Frequentist vs. Bayesian
- a frequentist defines a parameter as an unknown constant
- a Bayesian defines a parameter as a random variable
Model
Frequentist vs. Bayesian
- frequentist: f(x)
- Bayesian: f(x|θ) OR p(x|θ)
Bayesian Models
- choose a prior distribution to describe the uncertainty in the parameter: π(θ)
- observe data
- use Bayes’ Theorem to obtain a posterior distribution, π(θ|x)
- this posterior distribution could be used as a prior for the next experiment
Influence of the Prior
- the most frequent objection to Bayesian statistics is the subjectivity of the choice of prior
- however for robust data where the model is very good, the influence of the choice of prior should tend to 0 as the sample size increases
How do you determine the posterior distribution?
π(θ|x) = f(x|θ)π(θ) / [∫f(x|t)π(t)dt]
∝ f(x|θ)π(θ)
What can you do with a posterior distribution?
- give a point estimate of θ
- test hypotheses
Decision Theory
-anytime you make a decision you can lose something
-risk = expected loss
-the goal is to make decisions that will minimise risk
d = d(x) = ∈ D
-where d(x) is a decision based on the data and D is the decision space
Decision Space
- the set of all possible decisions that might be made based on the data
- for estimation, D = parameter space
- for hypothesis testing, D = two points
Loss Function
L = L(d(x), θ) ≥ 0
-when X and θ are random, L is a real-valued random variable
Expected Loss
E(L) = E(E(L|X))
= ∫ [∫ L(d(x), θ) dπ(θ)] dP(x)