Chapter 9: Conjugate Priors Flashcards
Suppose that we run a restaurant and want to build a model for the number of people who have food allergies in a particular sitting, X, in order to inform the buying of ingredients. If we assume that the allergy status of one person is independent of everyone else’s what is a reasonable model choice?
Binomial sampling model X ∼ B(n,θ ), where n is the number of people in a sitting and θ is the probability that a randomly chosen individual has an allergy.
We can write the binomial likelihood as:
P r ( X = k | θ , n ) = (|n,k|) θ^k ( 1 − θ )^(n − k)
∝ θ^k ( 1 − θ )^(n − k)
What would therefore be a good prior for θ and why?
Since the beta distribution is defined over the [0,1] interval and can represent a range of prior beliefs, we use a beta prior for θ
How is the beta distributions probability density function (pdf) written as?
p(θ | α , β ) = θ^a-1 (1 - θ)^B-1 / B(a,b)
∝ θ^α−1 (1−θ)^β−1
where B(α,β) is a beta function, which is not dependent on θ.
What about the sampling distribution: θ^k ( 1 − θ )^(n − k) and the prior distribution: θ^α−1 (1−θ)^β−1 stand out and how is this useful?
We notice that the expression for the sampling distribution and the expression for the prior both contain a term of the form θa(1−θ)b. When we use Bayes’ rule to calculate the posterior, we are required (in the numerator) to multiply together the likelihood and the prior, resulting in:
p(θ |data)∝ p(data|θ)× p(θ)
∝θ^k (1−θ)^(n−k) ×θ^(α−1) (1−θ)^β−1
=θ^(k+α−1) (1−θ)^(n−k+β−1)
=θ^(α′−1) (1−θ)^(β′−1)
What do α ′ and β ′ equal in this case (beta posterior)
α ′ = α + k and β ′ = β + n − k.
If we actually do the denominator calculation in Bayes’ rule what do we find it equals in this case?
B(α′,β′), meaning that the posterior PDF is a beta distribution, although with different parameters to the prior.
Here, what do we say the beta prior is relative to the binomial likelihood and why?
Here we say that the beta prior is conjugate to the binomial likelihood since the posterior is also a beta distribution.
The following represents the bayesian inference process:
prior -{likelihood}→ posterior
How is this described in the former example?
beta -{binomial}→ beta′
Why is beta’ used instaed of just beta?
Because it is a beta distribution but with updated parameters from the initial distribution
Generalise this rule in regarding the relationship between prior and posterior distributions
Conjugate priors are always defined relative to a particular likelihood and means that both the prior and the posterior come from the same family of distributions. Diagrammatically, we have that for a specified likelihood, L, and a prior distribution from a particular family, f ∈ F:
f -{L}→ f ′
If we feed a gamma prior into our Bayesian updating rule, we should get a gamma posterior out. But what would the likelihood be?
Poisson likelihood
If we feed a normal prior in, we should get a normal posterior out. BUt what would the likelihood be?
for particular normal likelihoods
Are there disadvantages to using conjugate priors?
While the use of conjugate priors makes Bayesian statistics easy, it can limit us. These limits are quickly approached when we need greater modelling flexibility, which is especially the case when using hierarchical models e.g if we notice that our data is more suited to a student’s t distribution than a normal distribution