Chapter 7: The posterior Flashcards
Often we describe a distribution by its summary characteristics. For example, we often want to know the mean value of a parameter. Essentially, what is this?
This is essentially a weighted mean (where the weights are provided by the values of the probability density function).
If we have the mathematical formula for a continuous distribution, how do we calculate this?
We calculate this by the following integral:
E[θ | data] = ∫(1,0) p(θ | data)θdθ
A point estimate is dangerous to use without what?
A point estimate is dangerous to use without some measure of our confidence in the value.
Describe how to calculate one useful measures of uncertainty and name an alternative approach
One useful measure of uncertainty is a parameter’s variance:
var(θ | data) = ∫(1,0) p(θ | data)(θ − (E[θ | data])^2 )dθ.
An alternative way to summarise uncertainty is by specifying an interval rather than a point estimate.
It is usually easier to understand an uncertainty if it is expressed in the same units as the mean. How is this achieved?
By taking the square root of the variance
Bayesian inference satisfies a property known as data order invariance. What does this mean?
If have two sets of data and want to use one as a prior in order to calculate a posterior which will become a prior for analysing the other set if data; it does not matter the order in which these are carried out, it produces the same result.
How do we investigate how changes in the prior distribution p(θ) affect the posterior?
Suppose that we find that two individuals in our sample of 10 people are disease-positive. We can use Bayes’ rule to write down an expression for the posterior diseased proportion (using a binomial model):
p(θ|X=2,N=10)= p(X=2|θ,N=10)×p(θ) / p( X = 2 | N = 10)
∝ p( X = 2 | θ , N = 10) × p(θ )
{ likelihood} {prior}
This tells us that the posterior is a sort of weighted geometric average of the likelihood and prior. This means that the posterior peak will be situated somewhere between the peaks of the likelihood and prior, so any changes to the prior will be mirrored by changes in the posterior
see figure 7.3 in docs
How do we investigate how changes in the likelihood affect the posterior?
Using the same expression. As we increase the numbers of disease-positive individuals, from X = 0 (left column) to X = 5 (middle column) to X = 10 (right column), we see that the likelihood shifts to the right and, correspondingly, the posterior peak shifts to give more weight to higher disease prevalences.
How does sample size affect the shape of the curve?
As the sample size increases, the likelihood function becomes narrower and much smaller in value, since the probability of generating a larger data set with any particular characteristics diminishes. Maintaining the proportion of disease-positive individuals in our sample at 20%, we can demonstrate how the posterior changes as we increase the sample size in figure 7.5 in docs
Why does the sample size have such an effect on the shape of the curve?
Since the posterior is related to the product of the likelihood and prior, it is sensitive to small values of either part. This means that as the sample size increases, and the likelihood function becomes smaller and narrower, the position of the posterior shifts towards the location of the likelihood peak.
While we can estimate the full posterior distribution for a parameter, we are often required to present point estimates. Why is this and what point does the book make regarding this topic?
This is sometimes to facilitate direct comparison with Frequentist approaches, but more often it is to allow policy makers to make decisions. They argue that, even if we are asked to provide a single estimated value, it is crucial that we provide a corresponding measure of uncertainty.
What are the three predominant point estimates in bayesian statistics?
There are three predominant point estimators in Bayesian statistics:
• the posterior mean
• the posterior median
• the maximum a posteriori (MAP) estimator
As described earlier, the posterior mean is just the expected value of the posterior distribution. For a univariate con- tinuous example, this is calculated by an integral:
E[θ|data]= ∫θ×p(θ|data)dθ
How is this calculated for discrete cases?
For the discrete case, we replace the above integral with a sum
What is the posterior median?
The posterior median is the point of a posterior distribution where 50% of probability mass lies on either side of it.
What is the MAP estimator?
The MAP estimator is simply the parameter value that corresponds to the highest point in the posterior and consequently is also referred to as the posterior mode
While each of these three estimators can be optimal in different circumstances, the authors believe that there is a clear hierarchy among them.
What estimator is at the top of the hierarchy? What is the reason for this?
At the top of the hierarchy is the posterior mean. This is our favourite for two reasons: first, it typically yields sensible estimates which are representative of the central position of the posterior distribution; second, and more mathematically, this estimator makes sense from a measure-theoretic perspective, since it accounts for the measure (don’t need to fully understand this.)