Chapter 6: The devil is in the denominator Flashcards
What function does the denominator in Bayes’ rule carry out?
The denominator of Bayes’ rule, p(data), is a number that ensures that the posterior distribution is a valid probability distribution by normalising the numerator term.
What is an alternative interpretation of the denominator?
There is, however, another interpretation of the denominator. Before we get the data, it is a probability distribution that represents our beliefs over all possible data samples.
How do we obtain the denominator?
We marginalise out all parameter dependence in the numerator.
How simple is this task of calculating the denominator?
The seeming simplicity of the previous statement belies the fact that, for most circumstances, this calculation is complicated and practically intractable.
Where does the numerator fail to become a valid probability density?
The numerator satisfies the first condition of a valid probability density – its values are non- negative. However, it falls down on the second test – its sum or integral (dependent on whether the parameters are discrete or continuous) across all parameter values does not typically equal 1.
Why does the denominator not contain θ?
This is because p(data) is a marginal probability density, obtained by summing or integrating out all dependence on θ. This parameter independence of the denominator ensures that the influence of θ on the shape of the posterior distribution is solely due to the numerator
There are two ways in which we will use bayes rule which use slightly different (although conceptually identical) versions of the denominator. What are these two versions for discrete data?
Pr(data) = E(All θ) Pr(data, ,θ) Pr(data) = E(All θ) Pr(data|θ)×Pr(θ).
There are two ways in which we will use bayes rule which use slightly different (although conceptually identical) versions of the denominator. What are these two versions for continuous data?
For continuous parameters we use the continuous analogue of the sum – an integral – to calculate a denominator of the form:
Pr(data) = S(All θ) Pr(data, ,θ)
Pr(data) = S(All θ) Pr(data|θ)×Pr(θ).
Imagine that we are a medical practitioner and want to calculate the probability that a patient has a particular disease. We use θ to represent the two possible outcomes:
θ = {0, disease positive; 1, disease negative}
Taking account of the patient’s medical history, we specify a prior probability of 1/4 that they have the disease. We subsequently obtain data from a diagnostic test and use this to re-evaluate the probability that the patient is disease-positive. To do this we choose a probability model (likelihood) of the form:
What do we implicitly assume about the probability of a negative test result in this model?
We implicitly assume that the probability of a negative test result equals 1 minus the positive test probabilities.
Imagine that we are a medical practitioner and want to calculate the probability that a patient has a particular disease. We use θ to represent the two possible outcomes:
θ = {0, disease positive; 1, disease negative}
Taking account of the patient’s medical history, we specify a prior probability of 1/4 that they have the disease. We subsequently obtain data from a diagnostic test and use this to re-evaluate the probability that the patient is disease-positive. To do this we choose a probability model (likelihood) of the form:
Pr(test positive|θ) = {1/10, θ = 0; 4/5, θ = 1} -------------------------------- Through Pr(test positive | θ = 0) > 0 what do we assume?
Since Pr(test positive | θ = 0) > 0 we are assuming that false positives do occur.
Imagine that we are a medical practitioner and want to calculate the probability that a patient has a particular disease. We use θ to represent the two possible outcomes:
θ = {0, disease positive; 1, disease negative}
Taking account of the patient’s medical history, we specify a prior probability of 1/4 that they have the disease. We subsequently obtain data from a diagnostic test and use this to re-evaluate the probability that the patient is disease-positive. To do this we choose a probability model (likelihood) of the form:
Suppose that the individual test result is positive for the disease. Use the following expression to calculate the denominator of Bayes’ rule in this case:
Pr(data) = E(All θ) Pr(data, ,θ)
Pr(data) = E(All θ) Pr(data|θ)×Pr(θ).
Pr(test positive) = E(1, θ = 0) Pr(test positive |θ)× Pr(θ)
=Pr(test positive|θ =0)×Pr(θ =0)+Pr(test positive|θ =1)×Pr(θ =1)
= 1/10 x 3/4 + 4/5 x 1/4 = 11/40
Is this denominator a valid probability density? What does this mean we can or cannot do?
The denominator is a valid probability density, meaning that we can calculate the counter-factual Pr(test negative) = 1 − Pr(test positive) = 29/40 .
Why should we be careful in interpreting this counter-factual?
We need to be careful with interpreting this last result, however, since it did not actually occur; Pr(test negative) is our model-implied probability that the individual will test negatively before we carry out the test and obtain the result.
What do we then do to obtain the posterior probability that the individual has the disease, given that they test positive?
Use bayes rule: Pr(θ =1|test positive)= Pr(test positive |θ = 1)× Pr(θ = 1) / Pr(test positive) = 4/5 x 1/4 / 11/ 40 =8 / 11
What is an alternative view of the denominator in regards to it being a distribution?
An alternative view of the denominator is as a probability distribution for the data before we observe it – in other words, the probability distribution for a future data sample given our choice of model.