3. Non-Linear Models Flashcards

1
Q

Linear exponential table

A

Formula sheet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
Variance functions for:
Normal 
Bernoulli
Poison
Gamma
Inv gaussian
A

Formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Most appropriate Link for Y~Bernoulli?

A

Logit, probit, comp log log

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How are the coefficients estimated in GLM.

A

MLEs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does a maximized log likelihood tell us?

A

Lo < Lb< Lsat

Tells us how close our model is to the saturated model, how well the model fits our data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does deviance measure in GLM? What is the formula? A good fit means a _____ deviance.

A

Good to test nested models, distance between saturated model and our model.

Small deviance = good fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Deviance is similar to _____ in MLR.

A

SSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Is a square root link appropriate for poisson count regression?

A

No, it outputs a negative number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If the poisson model is adequate, is the deviance a realization from the chi-square distribution?

A

Yes with df = pf- pr

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Formula for max-scaled R^2 and pseudo R^2.

A

Formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Formulas for AIC and BIC.

A

Formulas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Formula for Pearson residual, Pearson chi square statistic.

The Pearson chi square statistic has what value when the model fits well.

A

Formula and chi square = n-p-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Formula for deviance residuals and anscombe residuals.

A

Formulas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Does a large Pearson chi square statistic mean a good fit?

A

No, because it measures residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does overdispersion mean? How can we fix this issue? What is a disadvantage to us fixing it ?

A

When observed variability > models variability.

We use the formula with delta estimated as …

This changes the distribution of Y.

This is useful for distributions that have their means related to their variances, like poisson

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a likelihood ratio test used for? What is the test statistic, from what distribution, with what df?

A

Partial f test
Test stat = deviance with pf and pr from chi square
Df= pf -pr

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a goodness of fit test used for? What is the test stat, what distribution and what df?

A

Measures how well Y’s distribution matches.

Test stat = (O-E)/E, chi square

Df = w-g-1 
W= summing across how many intervals, y is split into a intervals 
G = number of free parameters
18
Q

What is the tweedie distribution? Fill in the table

A

Formula

19
Q

When Y is a binary response variable, what distribution and link function is commonly used?

A

Bernoulli for Y

Link: logit, probit, comp log log

20
Q
When Y~Bernoulli, what are the formulas for the following;
Log likelihood 
Score equations 
Deviance statistic
Pearson residual 
Pearson chi square
A

Formula

21
Q

In logistic regression, the score equations reduce to:

A

Formula

22
Q

What are 3 drawbacks to linear models?

A
  1. Mismatched fitted values
    Mismatch between ranges
  2. Heteroscedasticity
  3. Invalid residual analysis
    Residuals are regarded as continuous, this is violated for Bernoulli bc discrete
23
Q

When Y is nominal, what distribution and link is most appropriate

A

Y~ categorical (w) with the generalized logit model, formula is ____.

24
Q

When y is ordinal, we use _____ model.

A

Proportional odds cumulative model

25
Q
When Y~Poisson we use the log link. How does this simplify our formulas for the following:
Log likelihood 
Score 
Info matrix
Deviance statistic
Pearson residual 
Pearson chi square
A

Formulas

26
Q

Poisson count with exposures has link function

A

Formula

27
Q

Table for alternative count methods

A

Formula sheet

28
Q

Can we use a tweedie distribution on a count response variable?

A

No, the domain is part discrete and part continuous.

29
Q
Which of the following are true, and why?
A. A negbinom model is a special case of a heterogeneity model. 
B. A latent class model is a special case of a zero-inflated model.
A

A is true. It’s possible to set up a Poisson-gamma mixture in a heterogeneity model, resulting in a negative binomial model.

B is false. A zero-inflated model is a special case of a latent class model.

30
Q

If we compare Poisson regression with Poisson regression with exposures, which of the following statements will be true?
A. The coefficient estimates should not be the same for the two models.
B. With all else equal, a unit change in predictor xj changes the estimated means of both models equally if the corresponding coefficient estimate is the same.
C. One model is better at handling overdispersion.

A

A is true. They will have different score equations, and therefore different beta estimates.

B is true. “All else equal” takes the fact that the exposure would offset the mean away

C is false. They are both Poisson models and would have the same issue with overdispersion.

31
Q

Can we perform a likelihood ratio test on models that are not nested?

A

No

32
Q

In GLM, why is it not appropriate to use an F test without other information on the distribution of Y?

A

Because it’s not known that the response is being modelled as normal.

33
Q

Unless specified otherwise, what does the likelihood ratio chi squared test test? What df?

A

Null: all non-intercept coefficients are zero
Alternative: at least one non-intercept coefficient is non-zero.

Df= (# of parameters in null hyp) - (# of parameters in alternative)

34
Q

What is the compound poisson-gamma distribution?

A

Tweedie

35
Q

Which of the following models can be used to accommodate underdispersion?
Negative binomial
Zero-inflated
Hurdle

A

Only hurdle model

36
Q

If I want to model auto claims with poisson(Lambda), what model can I use if I want to model lambda as a continuous variable?
A. Zero inflated
B. Hurdle model
C. Heterogeneity model

A

Only the heterogeneity model uses a continuous mixture. The other two are discrete.

37
Q

When we want the variance to be greater than the mean of the response, what models could we use? (We are using poisson)
Negative binomial
Hurdle
Zero inflated

A

All 3.

Negative binomial: since this is a limiting case of poisson with variance being greater than the mean, this is suitable.

Zero inflated: this model requires selecting a discrete distribution starting with 0, which is true of poisson. If poisson is selected, then the variance will exceed the mean.

Hurdle: this model requires selecting a discrete distribution valid on integers starting with 1 (zero truncated poisson is a possibility) if zero truncated poisson is selected, then the variance will be greater than the mean.

38
Q

Which of the following are true about GLMs?
A. Inverse of the link function is the mean function.
B. The choice of distribution plays an important role in making inferences.
C. For distributions in the linear exponential family, the canonical link is the inverse of the mean function.

A

A. True.

B. False
It is the choice of the mean function and the variance function that drives the inference making.

C. True.

39
Q

Can we say that tweedie is a type of heterogeneity moDel?

A

No, heterogeneity are continuous mixtures. Tweedie is a compound distribution.

40
Q
If we wish to model the aggregate auto claims for a specific block of business, what would be the most appropriate?
Possion with exposures
Negative binomial 
Zero inflated model 
MLR
GLM with tweedie
A

Claim amounts are non-negative and modeled as a continuous variable.
This takes out negative binomial, poisson, and zero-inflated models because they all have count response variables.

MLR assumes a normally distributed response, that can take on any value. This eliminates MLR because claim amounts cannot be negative.

Tweedie is appropriate, because it has a mixture of a discrete distribution at y=0, while being continuous for the remainder of the domain.

41
Q

A canonical link sets theta equal to what?

A

XB