Topic 1: Bayesian inference Flashcards

1
Q

Describe Bayes rule within statistical inference

A

We can define Bayes rule as follows, and x is fixed:

xxxxEquationxxxx
(https://docs.google.com/document/d/1kvjf1mF_7ckvG-pV5z4nK6ysVgTVbRkeG7iJ0PAfWf4/edit?tab=t.0)

Here, g(mu|x) denote the posterior density

g(mu) prior density

f(x) marginal density of x

f_mu(x) is probability density function

xxxxEquationxxxx (https://docs.google.com/document/d/1kvjf1mF_7ckvG-pV5z4nK6ysVgTVbRkeG7iJ0PAfWf4/edit?tab=t.0)

L_x(mu) is the likelihood function, this → f_mu(x) with x fixed and mu varying, becomes the likelihood function with these conditions

c_x = constant, ensures probabilities sum to 1, we can ignore constants that only depend on x (not μ)

The point of this is: In bayes rules we fix data and have mu varying, the opposite of frequentist calculations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe Frequentist and Bayesian interpretation of probability

A

Frequentist:
- Often real-world scenarios, as it lacks prior information
- Describes the data, not the parameter (mu is fixed, x varies)
- Assigns probability to data, not parameters

Bayesian:
- Often the most intuitive
- Describes the parameters, not the data (mu varies, x is fixed)
- Assigns probability to parameters, not data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe the use of prior probability densities to characterise prior information

A

We can use prior probability densities, such g(μ) to explain g(mu|x). Such prior information can be many things, but one approach is with uninformative priors:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe uninformative priors

A

If have no prior information, but we want to solve it using Bayes’ rule, we use uninformative priors.

Uninformative in this case implies that no bias will be introduced into the final results, but rather that the data has the strongest influence on the conclusions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the different methods to make uninformative priors

A
  • Laplace’s Prior Has Issues: It doesn’t handle parameter transformations well.
  • Jeffreys’ Prior Fixes This: By using Fisher Information, Jeffreys’ prior adjusts for transformations and remains valid across parameter spaces

Laplace’s principle of insufficient reason: We will treat the parameters as equal, making them uniform prior distributions. It can’t always be used. Imagine we have a uniform prior distribution θ, but when transformed γ = e^θ, will NOT BE uniform, therefore, it can’t be used

Jeffrey’s priors: Relies on Fisher information (kind of frequentist). When transforming g^Jeff(µ) using Fisher, to g^Jeff(µ) using the variance, which the Fisher information approximately equals to, we get a correct transformation, making Jeffrey’s prior superior

https://docs.google.com/document/d/18z0CI-E7452ZgdiyS_hBn6mLYblkQiSE_7a-lwUDPyY/edit?tab=t.0

Flat prior: A flat prior assign equal probability density across the parameter space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the Frequentist and the Bayesian ways of inference

A

Key Difference in Practice:

Frequentist: “Based on this data, if we repeated this experiment many times, we’d expect…”

Bayesian: “Given our prior knowledge and this data, here’s what we believe about…”

Frequentist Approach:

  1. Method Choice:
    Must choose specific statistical methods for each problem
    Example: t-test for comparing means, chi-square for categorical data
  2. Balance Between Distributions:
    Methods need to work across different probability distributions
    Like having a Swiss Army knife that should work okay in many situations
  3. “Hoping to do well”:
    Methods designed to work reasonably well regardless of true parameter θ
    Like building a car that performs okay in all weather conditions
  4. Multiple Estimators:
    Different statistical tools for different questions
    Example: Using p-values for hypothesis testing but confidence intervals for estimation
  5. Scientific Objectivity:
    Claims to be more objective because it doesn’t use prior beliefs
    Relies purely on data and predefined methods

Bayesian Approach:

  1. Prior Required:
    Must specify your beliefs before seeing data
    Example: Based on previous studies, you might believe a drug’s effect is normally distributed around 30%
  2. Complete Solution:
    The posterior distribution tells you everything about your parameter
    Like having a full map rather than just directions to one place
  3. “All in” with Prior:
    Your results depend on your prior beliefs
    If your prior is wrong, results might be misleading
  4. Unified Framework:
    One approach (posterior distribution) answers all questions
    Can get point estimates, intervals, and probabilities all from the same posterior
  5. Subjectivity Acknowledged:
    Openly incorporates prior beliefs
    Results depend on quality of prior information
    More subjective when good prior information isn’t available
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe empirical Bayes

A

Empirical Bayesian (EB) is a hybrid approach where:

You do not assume a fully specified prior.
Instead, you estimate the prior distribution using the data.

This allows you to create a data-informed prior rather than using uninformative (flat) priors or subjective prior knowledge.

When to use:
On a individual level. When you observe multiple related groups and believe they share a common structure.

A way to do this is with Robbins’ formula:

EQUATION
https://docs.google.com/document/d/1LocITOV6qyAatci2yiokJxQtguqXhk_6WtxO7Sh8Ks4/edit?tab=t.0

Old:

Insurance company: The insurance company wants to know for how many claims each policy holder will do in the next year. We assume we don’t know the priors, and after rewriting it, we can get to the Robbins’ formula.

Bayesian estimation can be approximated empirically using observed data when dealing with similar observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the difference between subjective and objective Bayesian inference

A

Objective Bayesian:
- Introduces frequentist methods, along with Jeffrey’s priors (because Jeffrey’s priors uses Fisher information to make the priors, which is a frequentist approach)
- popular due to simplicity
- Using a formula to find the prior, regardless of your own belif

Subjective:
- Using your own experience
- From subjective perspective the objective way is only partially bayesian, subjective say: objective employs the Bayes’ theorem, but without doing the hard work of determining a convincing prior distribution
- Individual purpose

Old:

Frequentists criticize objective Bayes analysis for weak ties to traditional accuracy standards and reliance on large-sample assumptions. Its effectiveness hinges on the choice of prior, raising doubts about ignoring factors like stopping rules or selective inference without real-world grounding.

Still, objective Bayes methods remain popular for their simplicity in tackling complex problems. However, better theoretical justification and comparisons with other methods are needed to ensure reliability.

Subjective Bayesian:

Subjective Bayesianism is particularly appropriate for individual decision making, say for the business executive trying to choose the best investment in the face of uncertain information.

From the subjectivist point of view, objective Bayes is only partially Bayesian:

  • It employs the Bayes’ theorem, but without doing the hard work of determining a convincing prior distribution
  • This introduces frequentist methods, along with Jeffrey’s priors (because Jeffrey’s priors uses Fisher information to make the priors, which is a frequentist approach)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe what conjugate priors are

A

Its a prior distribution that, when combined with the likelihood function (Lx(µ)), gives you a posterior distribution in the same family as the prior.

Its main advantage is its computational efficiency, you can just update parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe Bayes factor and its relation to BIC

A

If we have two models with different means, how do we now which model is the best?

The BIC approximation to the Bayes factor B(x), BIC provides a computationally simple way to approximate Bayes Factors.

The Bayes factor B(x) is then ratio of marginal densities. It tells us how much the data favors M₁ over M₀.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe BIC for model selection

A

We have two models, we describe as hypotheses M_0: µ=0 and M_1: µ ≠ 0, BIC is a criterion for model selection and we want a smaller BIC, bc then it has less error
- - Compare models by computing their BIC values
- Select the model with lowest BIC - it balances model fit and complexity

What it does:
It penalizes the complexity of the model where complexity refers to the number of parameters in the model.

BIC provides a computationally simple way to approximate Bayes Factors

Old:

We have two models, M_0 and M_1.
M_0 is the null hypothesis and M_1 is the general two-sided alternative.

Frequentist: Run a hypothesis test of H_0: µ=0, a lot of times, to choose between M_0 and M_1

Bayesian: We want an evaluation of the posterior probabilities of M_0 and M_1 given x. We require prior probabilities for the two models. We use Bayes’ theorem (in its odds ratio form, to easily compare the two models).
https://docs.google.com/document/d/1_ByZaclwFohk7TIe3QT_0D5xdwJd_4rdx83wHla0fnc/edit?tab=t.0

This will then result in the Bayes factor
https://docs.google.com/document/d/1_ByZaclwFohk7TIe3QT_0D5xdwJd_4rdx83wHla0fnc/edit?tab=t.0

This leads to the statement that the posterior odds ratio is the prior odds ratio times the Bayes factor.

This is if we have the priors, if we don’t require them, we use BIC (Bayesian Information Criterion), when we don’t have informative choices of priors, then we use BIC, which uses the MLE, the degrees of freedom and the sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is Gibbs sampling and metropolis-Hastings sampling usefull?

A

Bayesian computations can become really expensive because of the marginal density function in the denominator in the bayesians rule.

  • Instead of computing the exact posterior (which requires difficult integrals)
  • You get a collection of samples that represent the posterior distribution

This can be done in these two methods Gibbs and metropolis-Hastings sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe Gibbs sampling

A

The process of updating variables and keeping others fixed (creates a Markov Chain)

Advantage of Gibbs sampling:
- Simplifies high-dimensional sampling by breaking it into conditionals
- Effective in terms of accuarcy for correlated variables, leveraging dependencies

Challenges of Gibbs sampling:
- Can be slow if variables are highly dependent
- Requires knowledge of all conditional distributions

How it works:
Iteratively sample each parameter from its conditional distribution while holding all other parameters fixed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe metropolis-Hastings sampling

A

If conditional distributions are not easy to sample from metropolis-hastings is the one to use.

Advantages:
- Effective for sampling from high dimensional and complex distributions
- Flexible in choice of proposal distribution

Challenges:
- Requires tuning of the proposal distribution for efficient sampling.
- Slow convergence if the target distribution and proposal distribution are poorly matched.

How to works:
The simple process is like this: proposing and then accepting or rejecting new points allows Metropolis-Hastings to explore the entire distribution. The decision is based on how likely this new point is compared to the current one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe Markov Chain Monte Carlo (MCMC)

A

A family of algorithms designed to sample from probability distributions efficiently.

Bayesian inference updates prior beliefs to calculate the posterior distribution, but direct computation is often infeasible. MCMC methods solve this by generating samples proportional to the posterior, approximating it without needing the exact formula. They use Markov chains, where each sample depends only on the previous one, enabling efficient exploration of complex distributions.

Markov Chain:
A sequence of random variables where the next state depends only on the current state (the Markov property).
Example: Transitioning from one point to another in the sample space.

Monte Carlo:
Refers to the use of random sampling to approximate numerical results, such as integrals or expectations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly