2. Parametric models, Exponential families Flashcards
What is a univariate parameter family
A parameter family of probability densistion, or probability mass functions if and only the sample space X is a subset of the real line R.
Name some familiar univariate families
Normal, Poisson, Binomial, Gamma, Beta (Poisson and Binomial is discrete, and the others are contrinues?)
All of these are in the exponentiel familie
A probability functions must hold that
It is positive and the probabilities sum to 1
What is a gaussion-variate random variable
Start with a definiton of the multivariate gaussian A Gaussion n-variare random variable X with a mean_0 and covarance gamma, is a random variable with the prob. density
Slide 10 X N(X_o, T/Gamma)
We assume the gamma is positive, this makes it kinda restricted?
The determinate (T) the scale of the space.
WHat is the Schur complements
We will devide the space of y and n? We take a postive matrix Gamma, we devide it intil some blocks slide: 11
Explain Lemma 3.4 slide 12
We can prove: if you take a determinate of the first matrix it is the same as taking the determinate of another????? We can use this for: it is possible to do gaussion elimination
if you have a multivariate gaussion, and a joint prob. function, the blocks is variance and covariance, we want to know the condiational distrubtion, this distrubution will also be gaussion, something about the compliment of the covarience matrix, the mean will also need to be correct slide 13
Theorem 3
Mariginal dist is also a gausssion,
What is the desired properties for an estimator
- Easy to compute
- Unbiased = E[T] = delta
- Efficient - var(T) is as low as possible
- Invariant to transformation of the paramter delta: if the distribution was parametrised in terms of g(delta) then g(T) would be an estimator for g(delta)
What properties does the MLE have?
- As the sample size increase, we get closer to actual VALUE
- it is consistent, it goes to the true values. It get more and more probable that we get the true value.
- is it biased or not? idk we didnt get the answer
- it is asympoticallt normal
- Inverse of the fisher information is the covariance matrix of the normal distritbution
How do we know if an estimator is biased?
Mean of a sample of height is unbiased, the variance is not unbiased, that is why we add an correction (something about minus n) - sometimes it is unbiased not always
what is convergance in probability?
If we take the difference from the true paramter value and the MLE estimate, the probability that the deviation is larger than the number epison is zero?
The probality of having a larger deviation than threshold goes to zero
This is a good thing for MLE
What is fisher estimation?
A powerfull thing for MLE, using the log likelikehood, we define a gradient for it, is called a core function
The expected value of the score i zero because:
We want to know how big is the spread? and the fisher information tells the covariance for the score.
What is a score function?
Using the log likelikehood, we define a gradient for it, is called a core function
If the likelihood is twice differentiable, what happens to the fisher information?
The fisher information equal to the Hessain of the expecred negative log likelihood - a matrix?
Multinomial distribution
It is a probability distribution
There is a close relation between the multinomial distribution and the poisson distribution.
Exponential families
Exponential family of distributions are extremely useful for statistical analysis. Give examples of why
- The only families with sufficient statistics that can summarize arbitrary
amounts of independent identically distributed data using a fixed number of values. (fx taking the mean can summaries the data) - Conjugate priors, an important property in Bayesian statistics.
- The posterior predictive distribution of a random variable with a conjugate
prior can always be written in closed form. - In the mean-field approximation in variational Bayes (Bayesian networks), the best approximating posterior distribution of an exponential-family node with a conjugate prior is in the same family as the node.
where would we find poisson distribution?
costumers going into a store, email you get in an hour - it counts the amount of events in a certain amount of time (that is why it is discrete)
- That is why it is discrete, we can only have whole numbers
What is a binomial distribution?
Number of correct events, you throw a coin and you want heads fx.
Deal with successrate
What is a gamma distribution?
Types of data that has long tail. We use it to model survival date fx.
What is a beta distribution
Sample space is from 0-1, binomial case is the discrete case of beta distribution, beta models some probability for percentage?
Instead of going for the success of the coin, we go for how it behaves
What is difference between variance and covariance?
Variance - how far spread is the data points?
Covariance - what does the two random variable say about eavh other