Objective Bayes Flashcards

1
Q

Difference between subjective and objective Bayesian inference

A

“For 200 years, however, two impediments stood between Bayesian theory’s philosophical attraction and its practical application.

In the absence of relevant past experience, the choice of a prior distribution introduces an unwanted subjective element into scientific inference.
Bayes’ rule (3.5) looks simple enough, but carrying out the numerical calculation of a posterior distribution often involves intricate higher-dimensional integrals.
Subjective Bayesianism is particularly appropriate for individual decision making, say for the business executive
trying to choose the best investment in the face of uncertain information. Less obvious in scientific settings.

Answer: objective Bayes inference: fashion objec-
tive, or “uninformative,” prior distributions that in some sense were unbiased in their effects upon the data analysis.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Objective prior distribution

A

“A flat, or uniform, distribution over the space of possible parameter values seems like the obvious choice for an uninformative prior distribution,
With flat prior the posterior is proportional to the likelihood function - brings us close to Fisherian inference. but Fisher was adamant in his insistance that likelihood was not probability
Fisher’s withering criticism of flat-prior Bayes inference focused on its lack of transformation invariance:

Jeffrey’s prior has transformation invariance.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Coverage Matching Priors

A

“This is unavoidable; it is mathematically impossible for any single prior to be uninformative for every choice of theta = t(mu).

Coverage-Matching Priors are a class of Bayesian priors designed to ensure that the posterior credible intervals have frequentist coverage properties. This means that the credible intervals derived from the posterior distribution closely match the frequentist confidence intervals in terms of their coverage probability.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Credible Intervals and Frequentist Coverage

A

“A Bayesian credible interval represents a range within which the true parameter value lies with a specified posterior probability (e.g., 95%).
Frequentist confidence intervals represent a range such that, over repeated sampling, the true parameter value falls within the interval a specified proportion of the time (e.g., 95%).”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Conjugate Prior Distributions

A

“A prior distribution is called conjugate to a likelihood function if the resulting posterior distribution is in the same family as the prior. It simplifies Bayesian analysis by ensuring analytical tractability.

Key Highlights:

Conjugacy ensures that prior and posterior belong to the same distribution family.
Leads to closed-form posterior expressions, avoiding the need for complex numerical methods.
Commonly used in Bayesian statistics for computational convenience.

The size of n0, the number of hypothetical prior observations, deter-
mines how informative or uninformative the prior gn0;x0./ is. Recent
objective Bayes literature has favored choosing n0 small, n0 = 1 being
popular. The hope here is to employ a proper prior (one that has a finite
integral), while still not injecting much unwarranted information into the
analysis.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Critique of Objective Bayes Inference

A

“From the subjectivist point of view, objective Bayes is only partially Bayesian: it
employs Bayes’ theorem but without doing the hard work of determining a
convincing prior distribution. This introduces frequentist elements into its
practice—clearly so in the case of Jeffreys’ prior.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Model selection and Bayesian Information Criterion

A

“In the problem’s simplest form, the statistician observes data x and wishes to choose between a smaller model M0 and a larger model M1.

Example
Small model is M0, the standard normal with mean = 0.
The larger model is M1, the general two-sided alternative with mu != 0.

Bayesian model selection aims for more: an evaluation of the posterior
probabilities of M0 and M1 given x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The Bayes factor

A

“We arrive at the Bayes factor:
the posterior odds ratio is the prior odds ratio times the Bayes factor.

Prior specifications (13.31)–(13.32) are usually unavailable in practical settings

Jeffreys suggested a scale of evidence for interpreting Bayes factors, re-
produced in Table 13.3; B.x/D10for instance constitutes positive but
not strong evidence in favor of the bigger model.

Jeffreys’ scale is a Bayesian version of Fisher’s interpretive scale for the outcome of a hypothetic
test, with coverage value (one minus the significance level) 0.95 famously
constituting “significant” evidence against the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Bayesian information criterion

A

“how to compute B(x) in practice without requiring informative choices of the priors?
A popular objective Bayes answer is provided by the Bayesian information criterion.
B_bic(x) will tend to be less than one, favoring M0 if it is true, ever more strongly as n increases.

The difference between BIC and frequentist hypothesis testing grows more drastic for large n.
At n=15 Fisher’s and Jeffreys’ scales give roughly similar assessments of the evidence against
M0(though Jeffreys’ nomenclature is more conservative). At the other end
of the table, at n = 10;000, the inferences are contradictory: z = 3.29,
with p-value 0.001 and coverage level 0.999, is overwhelming evidence
for M1 on Fisher’s scale, but barely worthwhile for Jeffreys.

The “untethered” criticism made against objective Bayes methods in
general is particularly applicable to BIC. The concept of “sample size”
is not well defined, as the prostate study example shows. Sample size co-
herency (13.49), the rationale for BIC’s strong bias toward smaller models,
is less convincing in the absence of priors based on genuine experience (es-
pecially if there is no prospect of the sample size changing).”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Gibbs Sampling and MCMC

A

“The increase in Bayesian applications, and the change in emphasis from
subjective to objective, had more to do with computation than philoso-
phy.

We would in this way be employing the same general tactic as the bootstrap, applied now
for Bayesian rather than frequentist purposes—toward the same goal as the
bootstrap, of freeing practical applications from the constraints of mathematical tractability.

Advantages

Simplifies high-dimensional sampling by breaking it into conditionals.
Effective for correlated variables, leveraging dependencies.
Challenges

Can mix slowly if variables are highly dependent. *
Requires knowledge of all conditional distributions.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Metropolis–Hastings

A

“Metropolis–Hastings is another MCMC method algorithm for generating samples from complex probability distributions.
It generalizes the Metropolis algorithm by allowing for asymmetric proposal distributions.

Metropolis–Hastings: Pros and Cons
Advantages

Effective for sampling from high-dimensional and complex distributions.
Flexible in choice of proposal distribution q(x∗|xt).
Challenges

Requires tuning of the proposal distribution for efficient sampling.
Slow convergence if the target distribution and proposal distribution are poorly matched.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly