Max likelihood & Bayes Flashcards
How does a nugget change our covariance matrix
goes from sigma^2 Sigma to sigma^2 Sigma + sigma^2-n I
why can we not just miltiply the pdfs of the data points to get the Likelihood
Our data are not I.I.d and we have to consider the joint distribution instead
See equation sheet to see likelihood of MVN
What are the difference between classical stats and bayesian
In classical/frequentist statistics:
▶ Probability of an event is the number of times the event happens divided by the number of trials (relative frequency)
▶ Parameters are fixed constants we try to estimate
In Bayesian statisics
▶ Probability is a measure of our degree of belief
▶ Parameters have probability distributions which we update when we collect data
States Bayes theorem
p(theta|x) = p(x|theta)p(theta)/p(x)
Why can we get rid of the denominator in bayes theorem in favor of proportionality
Our densities have to integreate to 1
Say Bayes theorem in words
Posterior is proportional to the prior times the likelihood
Loss function
The posterior is a distribution, to get a point estimate we specify a loss function and minimise the loss
Types of loss functions
- squared loss = mean of the posterior;
- absolute loss = median;
- (0, 1) loss = mode (also known as maximum a posteriori (MAP) estimates
Subjective bayes
By using a Prior we specify Bayes is subjective
Elciting beliefs
Turning prior opinons into usable distributions
Objective bayes
If we don’t like subjectivity/lazy we use objective priors
Conjugate prior
Having the form of the posterior and prior to be the same form
Conjugate prior for Normal
Also Normal for the mean
Inverse Gamma for the variance
Non-informative Prior
One option is the non-informative prior
This is either flat from −∞ to ∞ or 1/x from 0 to ∞
This prior does not alter the likelihood
The prior can be improper
Non-informative priors + (0,1) loss = MLE
How can we use informative priors that aren’t conjugate
MCMC