Chapter 1 Flashcards
Combining a random sample of size n from a normal N(mu, 1/tau) (with known h) with a normal prior (N(b, 1/d) results in what distribution? Outline any parameters. (2)
mu~N(B, 1/D)
Where B=(db+ntauxbar)/(d+ntau)
D=d+ntau
General case of Poisson data Xi|theta~Po(theta) and Gamma Ga(g,h) prior gives what posterior?
θ|x ∼Ga(G = g + n ̄x,H = h + n).
Issues with substantial prior knowledge.(2)
We have substantial prior information for θ when the prior distribution dominates the posterior distribution, that is π(θ|x) ∼π(θ).
1. The intractability of the mathematics in deriving the posterior distribution — though
with modern computing facilities this is less of a problem,
2. the practical formulation of the prior distribution — coherently specifying prior beliefs in the form of a probability distribution is far from straightforward.
Limited prior knowledge approach. (1)
Using conjugate pairs:
Poisson random sample, Gamma prior distribution −→Gamma posterior distribution
•Normal random sample (known variance), Normal prior distribution −→ Normal posterior distribution
Define conjugacy.(1)
Suppose that data x are to be observed with distribution f (x|θ). A family Fof prior distributions for θ is said to be conjugate to f (x|θ) if for every prior distribution π(θ) ∈F,
the posterior distribution π(θ|x) is also in F.
Notice that the conjugate family depends crucially on the model chosen for the data x.
For example, the only family conjugate to the model “random sample from a Poisson distribution” is the Gamma family.
Vague prior.(1)
We represent vague prior knowledge by using a prior distribution which is conjugate to the model for x and which is as diffuse as possible, that is, has as large a variance as possible.
Asymptotic posterior.(2)
J(θ) = − ∂2/∂θ2 log f (x|θ).
This means that, with increasing amounts of data, the posterior distribution looks more and more like a normal distribution. The result also gives us a useful approximation to the posterior distribution for θ when n is large:
θ|x ∼N{ˆθ,J( ˆθ)−1} approximately.
What are bayesian confidence intervals sometimes called?(1)
Bayesian confidence intervals are sometimes called credibleregions or plausible regions.
Clearly these intervals are not unique, since there will be
many intervals with the correct probability coverage for a given posterior distribution.
Predictive distribution.(1)
Implicit in the Bayesian framework is the concept of the predictive distribution. This distribution describes how likely are different outcomes of a future experiment. The
predictive probability (density) function is calculated as
f (y|x) =∫Θf (y|θ) π(θ|x) dθ
Candidates formula.(2)
However, when the past data x and future data y are independent (given θ) and we use a conjugate prior
distribution, can calculate predictive distribution through candidates formula:
f (y|x) = [f (y|θ)π(θ|x)]/π(θ|x,y).
Mixed density functions definition.(1)
A mixture of the distributions πi(θ) with weights pi (i = 1,2,…,m) has probability
(density) function:
π(θ) =m∑i=1{pi*πi(θ)}.
Mean and var for mixed distributions.(3)
Ei(θ) =∫Θ[θπi(θ)]dθ and Vari(θ) =∫Θ[{θ −Ei(θ)}^2πi(θ) ]dθ
sum of mean=pimean for Ei.
meanwhile Var of mixed:
E(theta)^2=sum pi*(var i dist+mean dist i^2)
Hence Var= Ethetasquared-=Ethetaallsquared
Combining mixed prior with data x gives what posterior? What about if this was a pi(theta) prior?
π(θ|x) = [π(θ)f (x|θ)]/f (x)
=m∑i=1{[piπi(θ)*f (x|θ)}/[f(x)]
π(θ|x) =m∑i=1[pifi(x)/f(x)]*πi(θ|x).
p∗i=pifi(x)/f (x).
Hence, combining data x with a mixture prior distribution (pi,πi(θ)) produces a posterior mixture distribution (p∗i ,πi(θ|x)). The effect of introducing the data is to “update” the mixture weights (pi →p∗i ) and the component distributions (πi(θ) →πi(θ|x)).
Bayes theorem.(1)
P(A|B)=[P(B|A)*P(A)]/(P|B)