Bayesian inference Flashcards
Bayes: Sample space and parameter space
“Sample space and parameter space
The fundamental unit of statistical inference both for frequentists and
for Bayesians is a family of probability densities.
Bayesian inference requires one crucial assumption in addition to the
probability family F , the knowledge of a prior density
g(mu) represents prior information concerning the parameter , available to the statistician before the observation of x.
Exactly what constitutes “prior knowledge” is a crucial question we will consider in ongoing discussions of Bayes’ theorem.”
Bayes rule (theorem)
In Bayes’ formula (3.5), x is fixed at its observed value while mu varies over omega!! , just the opposite of frequentist calculations.
Bayes example: physicist’s twins
““What is the probability my twins will be Identical, rather than Fraternal?”
The doctor answered that one-third of twin births were Identicals, and two-thirds Fraternals.
In this situation , the unknown parameter (or “state of nature”) is either
Identical or Fraternal with prior probability 1/3 or 2/3;
X, the possible sonogram results for twin births, is either Same Sex or Different Sexes, and x = Same Sex was observed.
A crucial fact is that identical twins are always same-sex
while fraternals have probability 0.5 of same or different, so Same Sex in
the sonogram is twice as likely if the twins are Identical”
frequentist and Bayesian interpretation of probability;
The Bayesian method
Missing priors
“The prior for the twins problems was based on a large amount of relevant previous
experience. Such experience is most often unavailable. Modern Bayesian
practice uses various strategies to construct an appropriate “prior” g./ in the absence of prior experience, leaving many statisticians unconvinced by the resulting Bayesian inferences. Our second example illustrates the difficulty.”
Flat prior
“In this case, as in the majority of scientific situations, we don’t have a
trove of relevant past experience ready to provide a prior g(x). One expe-
dient, going back to Laplace, is the “principle of insufficient reason,” that is, we take mu to be uniformly distributed over omega,
Pasted image 20250106174443.png
Figure 3.2 shows the resulting posterior density (3.5), which is just the likelihood (frequentist MLE estimate) plotted as a function of (and scaled to have integral 1).
“
Jeffrey’s prior
“It depends, interestingly enough, on the frequentist notion of Fisher information. Suggests somewhat bigger
Properties:
Reparameterization invariance: Remains valid regardless of how parameters are transformed. Often used for scale-invariant problems or when no strong prior knowledge is available.
Assumptions:
Model is sufficiently regular for the Fisher Information to exist. The prior should not dominate the posterior in the presence of data.
Limitations:
Not always proper (may not integrate to 1 over the parameter space). Can lead to unintuitive results in high-dimensional or complex models. Sensitivity to model irregularities.
Example: For a parameter θθ in a normal distribution with known variance, Jeffrey’s prior is π(θ)∝1π(θ)∝1, reflecting no prior knowledge about the mean.”
Frequentist vs. Bayesian Inference
“Bayesian inference, surprisingly, is immune to selection bias (This isn’t obvious, but follows from the fact that any data-based selection process does not affect the likelihood function in ).
The operative point here is that there is a price to
be paid for the desirable properties of Bayesian inference. Attention shifts
from choosing a good frequentist procedure to choosing an appropriate
prior distribution. This can be a formidable task in high-dimensional prob-
lems, the very kinds featured in computer-age inference.
Bayesian approach is especially appealing in dynamic contexts, where data arrives sequentially and updating one’s beliefs is a natural practice. Bayes’ rule was used to devastating effect before the 2012 US presidential election
n the absence of genuine prior information, a whiff of subjectivity6hangs
over Bayesian results, even those based on uninformative priors. Classical
frequentism claimed for itself the high ground of scientific objectivity.
typically be high-dimensional in the chapters that follow, sometimes very
high-dimensional, straining to the breaking point both the frequentist and
the Bayesian paradigms. Computer-age statistical inference at its most
successful combines elements of the two philosophies, as for instance in
the empirical Bayes methods of Chapter 6, and the lasso in Chapter 16”
Empirical Bayes
“The terminology empirical Bayes is apt here: Bayesian formula (6.5) for a
single subject is estimated empirically (i.e., frequentistically) from a col-
lection of similar cases. The crucial point, and the surprise, is that large
data sets of parallel situations carry within them their own Bayesian in-
formation. Large parallel data sets are a hallmark of twenty-first-century
scientific investigation, promoting the popularity of empirical Bayes meth-
ods.
Robbins formula
“Robbins’ formula is a key concept in Empirical Bayes (EB) methodology. It provides a way to estimate the posterior mean of a parameter in a Bayesian framework without specifying a prior distribution explicitly. Instead, the prior is estimated empirically from the data, which is especially useful in situations with repeated observations or multiple similar units
A formula used in the empirical Bayes framework to estimate the posterior mean of a parameter without fully specifying a prior distribution. It directly estimates the expected value of the parameter given observed data.”
The missing species problem
“if he trapped for one additional year, how many new species would he expect to capture?
The question relates to the absent entry in Table 6.2, x = 0, the species that
haven’t been seen yet.
“