4. Generalised linear models, Maximum likelihood Flashcards

Question 1

Q

Show the Probability density, Likelihood function, Log likelihood, Maximum likelihood estimate µˆ of µ, MLE ˆθ of θ = T(µ):
And tell how they are related/work together

Answer

A

Maximum likelihood estimate is the vector fx that maximises

Question 2

Q

What do we use a likelihood function for?

Answer

A

We are looking for a way to estimate the parameters for the probablity density function.
The likelihood function is the same as the probiblity density, but we just dont know the parameters, we know the data

Question 3

Q

Why do we have a log likelihood

Answer

A

It is much easier working with logs, because we dont have to work with productis, which is not nice when doing derivatives.

Question 4

Q

What are advanteges of MLE?

Answer

A

Automatic estimate without further statistical assumptions.
Excellent frequentist properties: Nearly unbiased in large samples.
Bayesian justification:
show how this is a bayesian justification

Question 5

Q

How can we measure how good the MLE is:

Answer

A

we can use the fisher information, It tells us something about the rubustness.

Question 6

Q

What are the downsides of MLE?

Answer

A

MLE estimates can be extremely off if estimated on little data.
▶ MLE estimate of the Bernoulli parameter of a coin flip based on a sample of one
will always be exactly 0 or 1.
With large numbers of parameters, ˆθ = T(ˆµ) may be off
even if each component of µˆ is well estimated.
▶ Next week: James-Stein estimator

Question 7

Q

what is the score function?

Answer

A

It is the derivative of the log likelihood function, with respect the parameters. This becaomes the derivate om the non-likelihood function over the log liklihood function.

The score function indicates how much the log-likelihood changes if you vary the
parameter estimate by an infinitesimal amount, given data.

When we solve for the MLE we solve for 0.

Question 8

Q

What does fisher tell us

Answer

A

It is the expectation of the dervivate of the loglikelihood squared. If the fisher information is high, then the MLE is very sensitive to the data, and therefore also has a small varience, so if we repeat this with similar data, we would expect similar fisher information. If there is no change,

Question 9

Q

What can we use the fisher for?

Answer

A

We can use it so gain some insights about

Question 10

Q

Answer

A

In a large sample the MLE estimates are aprx normallt distr. and has a variance of 1/fisher rinformation

Question 11

Q

What is the conditionality principles?

Answer

A

We only care about experiments we can actually perform.

Question 12

Q

What is an ancillary statistic?

Answer

A

A statistic that contains “no direct information by itself”, but describes the experiment
that was performed.
▶ Sample size
▶ Marginals of a contingency table

Question 13

Q

What is the observed fisher information

Answer

A

Fisher would rather use the observed fisher information than the fisher information itself, because the fisher information contains the “expectation”, he would rather compute the expression on actual data. It gives a better and more specific idea of the acc of the estimated (this is kinda debated)

Question 14

Q

What is randomisation?

Answer

A

In an experiment (trial) comparing two treatments A and B,
participants should be randomly assigned to either treatment A or treatment B.
▶ Participants may have any number of confounding traits favouring a positive or
negative outcome, regardless of the treatment. We can’t control for all of them.
▶ By assigning participants randomly, the effects of the confounders should even out.
▶ This enables us to conclude that any observed effect is in fact
due to the variables we’re testing.
▶ “Forced frequentism, with the statistician imposing
his or her preferred probability mechanism upon the data.”

The problem is: you need large and expensive studies. This is the gold standard in medicin

Question 15

Q

What is permutation?

Answer

A

▶ Much of Fisher’s methodology depends on normal sampling assumptions.
▶ Permutation testing is a non-parametric alternative.
▶ To test significance in a two-sample comparison:
▶ Pool all items in the two samples.
▶ Randomly partition them into two parts and compute test statistic
(e.g., difference of means).
▶ Construct empirical distribution of test statistic.
▶ Very similar to the bootstrap.
▶ Application: Testing performance of NLP systems by BLEU score.

Question 16

Q

Explain what is what in this equation about exponentiel families (look at slide 8):

Answer

Study These Flashcards

A

fλ(x) = e
λy−γ(λ)
gλ0
(x)

Question 17

Q

Data assumed normally distributed
(normal residuals).

Answer

Study These Flashcards

A

Question 18

Q

What are some cases where data is not normally distrubuted?

Answer

Study These Flashcards

A

▶ Counts (non-negative integers)
▶ Proportions (non-negative reals between 0 and 1)
▶ Waiting times (non-negative reals)
▶ Ratios (often heavy-tailed distributions)

Question 19

Q

Generalised linear regression

Answer

Study These Flashcards

A

Generalised linear models are a principled way to apply regression to quantities that are
not normally distributed.

Question 20

Q

Answer

Study These Flashcards

A

▶ Data are from a one-parameter exponential family.
▶ Natural parameter modelled as an affine function of the features.
▶ The means are connected to the natural parameter by a link function
that depends on the assumed distribution

Question 21

Q

What is sufficient statistic

Answer

Study These Flashcards

A

Question 22

Q

What is deviance

Answer

Study These Flashcards

A

Question 23

Q

Answer

Study These Flashcards

A

4. Generalised linear models, Maximum likelihood Flashcards

(23 cards)