4. Generalised linear models, Maximum likelihood Flashcards

1
Q

Show the Probability density, Likelihood function, Log likelihood, Maximum likelihood estimate µˆ of µ, MLE ˆθ of θ = T(µ):
And tell how they are related/work together

A

Maximum likelihood estimate is the vector fx that maximises

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do we use a likelihood function for?

A

We are looking for a way to estimate the parameters for the probablity density function.
The likelihood function is the same as the probiblity density, but we just dont know the parameters, we know the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why do we have a log likelihood

A

It is much easier working with logs, because we dont have to work with productis, which is not nice when doing derivatives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are advanteges of MLE?

A
  • Automatic estimate without further statistical assumptions.
  • Excellent frequentist properties: Nearly unbiased in large samples.
  • Bayesian justification:
    show how this is a bayesian justification
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can we measure how good the MLE is:

A

we can use the fisher information, It tells us something about the rubustness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the downsides of MLE?

A

MLE estimates can be extremely off if estimated on little data.
▶ MLE estimate of the Bernoulli parameter of a coin flip based on a sample of one
will always be exactly 0 or 1.
With large numbers of parameters, ˆθ = T(ˆµ) may be off
even if each component of µˆ is well estimated.
▶ Next week: James-Stein estimator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the score function?

A

It is the derivative of the log likelihood function, with respect the parameters. This becaomes the derivate om the non-likelihood function over the log liklihood function.

The score function indicates how much the log-likelihood changes if you vary the
parameter estimate by an infinitesimal amount, given data.

When we solve for the MLE we solve for 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does fisher tell us

A

It is the expectation of the dervivate of the loglikelihood squared. If the fisher information is high, then the MLE is very sensitive to the data, and therefore also has a small varience, so if we repeat this with similar data, we would expect similar fisher information. If there is no change,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What can we use the fisher for?

A

We can use it so gain some insights about

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
A

In a large sample the MLE estimates are aprx normallt distr. and has a variance of 1/fisher rinformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the conditionality principles?

A

We only care about experiments we can actually perform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an ancillary statistic?

A

A statistic that contains “no direct information by itself”, but describes the experiment
that was performed.
▶ Sample size
▶ Marginals of a contingency table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the observed fisher information

A

Fisher would rather use the observed fisher information than the fisher information itself, because the fisher information contains the “expectation”, he would rather compute the expression on actual data. It gives a better and more specific idea of the acc of the estimated (this is kinda debated)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is randomisation?

A

In an experiment (trial) comparing two treatments A and B,
participants should be randomly assigned to either treatment A or treatment B.
▶ Participants may have any number of confounding traits favouring a positive or
negative outcome, regardless of the treatment. We can’t control for all of them.
▶ By assigning participants randomly, the effects of the confounders should even out.
▶ This enables us to conclude that any observed effect is in fact
due to the variables we’re testing.
▶ “Forced frequentism, with the statistician imposing
his or her preferred probability mechanism upon the data.”

The problem is: you need large and expensive studies. This is the gold standard in medicin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is permutation?

A

▶ Much of Fisher’s methodology depends on normal sampling assumptions.
▶ Permutation testing is a non-parametric alternative.
▶ To test significance in a two-sample comparison:
▶ Pool all items in the two samples.
▶ Randomly partition them into two parts and compute test statistic
(e.g., difference of means).
▶ Construct empirical distribution of test statistic.
▶ Very similar to the bootstrap.
▶ Application: Testing performance of NLP systems by BLEU score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain what is what in this equation about exponentiel families (look at slide 8):

A

fλ(x) = e
λy−γ(λ)
gλ0
(x)

17
Q

Data assumed normally distributed
(normal residuals).

A
18
Q

What are some cases where data is not normally distrubuted?

A

▶ Counts (non-negative integers)
▶ Proportions (non-negative reals between 0 and 1)
▶ Waiting times (non-negative reals)
▶ Ratios (often heavy-tailed distributions)

19
Q

Generalised linear regression

A

Generalised linear models are a principled way to apply regression to quantities that are
not normally distributed.

20
Q
A

▶ Data are from a one-parameter exponential family.
▶ Natural parameter modelled as an affine function of the features.
▶ The means are connected to the natural parameter by a link function
that depends on the assumed distribution

21
Q

What is sufficient statistic

A
22
Q

What is deviance

A
23
Q
A