Bayesian Inference & Decision Theory Flashcards
What is decision theory?
The practice of making decisions under uncertainty
What are the different perspectives on distribution parameters between Frequentist Statistics and Bayesian Statistics?
In Frequentist Statistics, parameter theta is considered to be fixed at some point estimate. In Bayesian Statistics, theta is considered to be something that can vary based on some distribution (i.e. posterior probabilities)
What is the equation of P(A|B)?
[P(A and B)]/P(B)
or
[P(B|A).PA]/[P(B|A).P(A) + P(B|not A).P(not A)] (for 2 possible events)
*the denominator is the same as P(B)
What is the equation for P(A and B)?
P(A|B).P(B)
What is the equation for the expected value of a pmf?
summation(x.p(x)) through all outcomes
What is the objective of Bayesian Inference?
To logically and coherently update state probabilities as new evidence becomes available
Is it always logical to maximise profit or minimise loss?
No. We have to consider the scale of the downside. Sometimes risk-tolerance makes us less perfectly rational
What is EVPI?
The expected value of perfect information difference between the best expected outcome with the prior information and the best possible outcome perfect information
What is predictive probability?
The overall probability of observing some X in the data
What is a posterior probability?
A probability representing theta equalling some value after factoring additional information
What is the Excel function for weighted summation?
= SUMPRODUCT(number cells, weight cells)
What is the equation for m(x(k))? (I.e. the predictive probability of x(k)
summation through states of [P(x(k) | theta).P(theta)]
What does the Latin term “a posteriori” mean?
Dependent on empirical evidence or experience
What does the Latin term “a priori” mean?
Independent of experience
What is EVSI?
The expected value of sample information is the difference between the best outcome with prior probabilities and the best outcome with posterior.
What is the difference between subjective Bayesian Inference and more traditional objective Bayesian Inference?
More objective Bayesian Inference uses objective statistical inference to determine the distributions of observations conditional on the underlying states (i.e. P(X|theta))
What are the conditions for a sample from binomial sampling to follow a binomial distribution?
- The population must be so large that the sample does not disturb the population proportions
or - We sample with replacement
What is binomial sampling?
When a number of observations are sampled and grouped into one of two levels (traditionally success or failure)
What is a subjective probability?
A probability indicating the current assessment of how likely it is that the tree value for theta is some value. Normally shown as pi(theta=n)
What is the Excel function for calculating binomial probabilities?
=BINOMDIST
What would happen to our posterior distribution for theta if we observe an outcome of x=5 in our binomial sample of 100?
The posterior distribution of theta would be centered approximately on theta=0.05 (i.e. the proportion of success in the sample [5/100])
What do we do in our setting up of the prior distribution when we want to consider values over some continuous interval as opposed to some set of discrete values?
We assign a probability density function to theta
How do we set up our prior distribution of theta if we want to consider all continuous values between some interval with equal likelihood a priori?
We use a uniform distribution for theta (i.e. pi(theta) = 1/[a+b] for a < theta < b)
What is the posterior distribution of theta when using a uniform prior and binomial sampling?
pi(theta|x) = k(theta^x)[(1-theta)^(n-x)]
- this is actually a beta distribution where k would be gamma(n+2)/[(gamma(x+1).gamma(n-x+1))]
- where k is a constant that comes about from canceling terms in joint/predictive and scaling to ensure that the resulting distribution integrates to 1
Assume we use a beta distribution to model the prior distribution for theta, what is the posterior distribution of theta?
prior = k.[(theta^(a-1)).(1-theta)^(b-1)]
pi(theta|x) = k.[(theta^(a+x-1)).(1-theta)^(b+n-x-1)]
What is the expected value of a theta which follows a beta distribution?
E[theta] = a/(a+b)
What is the expected value of 1-theta from a beta distribution?
E[1-theta] = b/(a+b)
What is the variance of a theta which follows a beta distribution?
var[theta] = ab/[(a+b)²(a+b+1)]
How do we calculate what the variance of a theta posterior distribution should be based on the prior expectations?
We use the approximate number of standard deviations in the given range to give the quantity for a single deviation. We then square that and make it equal var(theta)
Is the uniform distribution a special case of the beta distribution? If so, what are its parameters?
Yes. The uniform distribution is a beta distribution with alpha=beta=1.
What can we say about the centrality of the posterior distribution if we use a relatively informative prior? (like a uniform distribution)
The posterior distribution will be more-or-less centered around the sample means
What are “iid” observations?
Observations that are independent and identically distributed
What is a likelihood function?
A likelihood function is kind of like the reverse of a probability density function. Instead of taking parameters and giving the probability of x given those parameters (i.e. f(x|theta)), the likelihood function takes the data (x) and returns the likelihood that theta equals some value given the data.
L(theta|x) = product over all n of f(x|theta)
Why is the requirement of a prior distribution seen as both a strength and a weakness of the Bayesian approach to inference?
- We sometimes do have real and meaningful prior information which we can then exploit (especially if the results of the analysis are to be used as a basis for decision making)
- Some argue that prior information introduces an element of subjectivity which may conflict with a desire for objectivity in the data analysis.
What is the probability density function for theta as using a likelihood function?
pi(theta) = [L(theta|x1,x2,…xn).pi(theta)]/[m(x1,x2,…xn)]
- where m is the predictive probability density function given by the integral over -inf to int of the numerator d(theta).
- The role of the denominator is basically to scale/standardize the posterior distribution so that it integrates to 1.
- Because of this, we often write the posterior in the form of simply k.L(theta|x1,x2,…xn) and say the posterior distribution of theta is directly proportional to L(theta|x1,x2,…xn)
What is an advantage of using a normalization constant?
Any factors in L(theta|x1,x2…xn) or pi(theta) which do not depend on theta can be absorbed into the normalization constant and effectively ignored.
What are the 3 main areas of statistical inference that we might use the posterior distribution of theta for?
- Hypothesis Testing
- Interval Estimation
- Point Estimation
What is a type-I error?
The rejection of H0 when it is in fact true
What is a type-II error?
The acceptance of H0 when it is in fact false
Given a null hypothesis of theta being some value (theta0) with probability pi0, what are the odds of theta0 being the correct value?
It will be the ratios of the probabilities of theta being theta0 and theta not being theta0.
= [L(theta0|data).pi0]/[L(theta1|data).(1-pi0)]
This represent the product of [the a priori odds] and a likelihood ratio also known as a Bayes Factor.
Assume that the cost of a type-I error is CI and that the cost of a type-II error is CII, what is the decision which minimizes expected cost?
Reject H0 if CI.pi(theta0|data) <= CII.pi(theta1|data)
How can we approximate values for L and U such that the probability of theta belonging to the interval [L,U] is equal to 100(1-alpha)%? A.k.a Interval Estimation or calculating a credibility interval
We set the integral of the posterior distribution for theta equal to 1-alpha with the bounds L and U then numerically approximate.
What is the Bayesian estimate?
The expected value of the posterior distribution. Under certain circumstances, other measures such at the mode or the median will be used as the estimator
What is the likelihood function of lambda for a sample from the Poisson distribution?
L(lambda | data) is directly proportional to [lambda^(sum of xi over 1 to n)].[e^-n.lambda]
- We know for sure that lambda is strictly positive so the prior distribution should be restricted to positive values only
Assume a prior distribution of theta to be a gamma distribution with parameters alpha and phi. What is the posterior distribution of this prior
Gamma distribution with parameters (alpha+sum of xis) and (phi + n)
What is the posterior expectation of of lambda given a gamma posterior distribution?
a/b
*If you expand this, it equates to a weighted average of the prior and sample estimates
DO NORMAL SAMPLING
K