3. Non-Linear Models Flashcards by Marija Spehar

Linear exponential table

Formula sheet

How well did you know this?

Not at all

Perfectly

Variance functions for:
Normal 
Bernoulli
Poison
Gamma
Inv gaussian

Formula

How well did you know this?

Not at all

Perfectly

Most appropriate Link for Y~Bernoulli?

Logit, probit, comp log log

How well did you know this?

Not at all

Perfectly

How are the coefficients estimated in GLM.

MLEs

How well did you know this?

Not at all

Perfectly

What does a maximized log likelihood tell us?

Lo < Lb< Lsat

Tells us how close our model is to the saturated model, how well the model fits our data.

How well did you know this?

Not at all

Perfectly

What does deviance measure in GLM? What is the formula? A good fit means a _____ deviance.

Good to test nested models, distance between saturated model and our model.

Small deviance = good fit

How well did you know this?

Not at all

Perfectly

Deviance is similar to _____ in MLR.

SSE

How well did you know this?

Not at all

Perfectly

Is a square root link appropriate for poisson count regression?

No, it outputs a negative number

How well did you know this?

Not at all

Perfectly

If the poisson model is adequate, is the deviance a realization from the chi-square distribution?

Yes with df = pf- pr

How well did you know this?

Not at all

Perfectly

Formula for max-scaled R^2 and pseudo R^2.

Formula

How well did you know this?

Not at all

Perfectly

Formulas for AIC and BIC.

Formulas

How well did you know this?

Not at all

Perfectly

Formula for Pearson residual, Pearson chi square statistic.

The Pearson chi square statistic has what value when the model fits well.

Formula and chi square = n-p-1

How well did you know this?

Not at all

Perfectly

Formula for deviance residuals and anscombe residuals.

Formulas

How well did you know this?

Not at all

Perfectly

Does a large Pearson chi square statistic mean a good fit?

No, because it measures residuals

How well did you know this?

Not at all

Perfectly

What does overdispersion mean? How can we fix this issue? What is a disadvantage to us fixing it ?

When observed variability > models variability.

We use the formula with delta estimated as …

This changes the distribution of Y.

This is useful for distributions that have their means related to their variances, like poisson

How well did you know this?

Not at all

Perfectly

What is a likelihood ratio test used for? What is the test statistic, from what distribution, with what df?

Partial f test
Test stat = deviance with pf and pr from chi square
Df= pf -pr

How well did you know this?

Not at all

Perfectly

What is a goodness of fit test used for? What is the test stat, what distribution and what df?

Study These Flashcards

Measures how well Y’s distribution matches.

Test stat = (O-E)/E, chi square

Df = w-g-1 
W= summing across how many intervals, y is split into a intervals 
G = number of free parameters

What is the tweedie distribution? Fill in the table

Study These Flashcards

Formula

When Y is a binary response variable, what distribution and link function is commonly used?

Study These Flashcards

Bernoulli for Y

Link: logit, probit, comp log log

When Y~Bernoulli, what are the formulas for the following;
Log likelihood 
Score equations 
Deviance statistic
Pearson residual 
Pearson chi square

Study These Flashcards

Formula

In logistic regression, the score equations reduce to:

Study These Flashcards

Formula

What are 3 drawbacks to linear models?

Study These Flashcards

Mismatched fitted values
Mismatch between ranges
Heteroscedasticity
Invalid residual analysis
Residuals are regarded as continuous, this is violated for Bernoulli bc discrete

When Y is nominal, what distribution and link is most appropriate

Study These Flashcards

Y~ categorical (w) with the generalized logit model, formula is ____.

When y is ordinal, we use _____ model.

Study These Flashcards

Proportional odds cumulative model

``` When Y~Poisson we use the log link. How does this simplify our formulas for the following: Log likelihood Score Info matrix Deviance statistic Pearson residual Pearson chi square ```

Formulas

Poisson count with exposures has link function

Formula

Table for alternative count methods

Formula sheet

Can we use a tweedie distribution on a count response variable?

No, the domain is part discrete and part continuous.

``` Which of the following are true, and why? A. A negbinom model is a special case of a heterogeneity model. B. A latent class model is a special case of a zero-inflated model. ```

A is true. It’s possible to set up a Poisson-gamma mixture in a heterogeneity model, resulting in a negative binomial model. B is false. A zero-inflated model is a special case of a latent class model.

If we compare Poisson regression with Poisson regression with exposures, which of the following statements will be true? A. The coefficient estimates should not be the same for the two models. B. With all else equal, a unit change in predictor xj changes the estimated means of both models equally if the corresponding coefficient estimate is the same. C. One model is better at handling overdispersion.

A is true. They will have different score equations, and therefore different beta estimates. B is true. “All else equal” takes the fact that the exposure would offset the mean away C is false. They are both Poisson models and would have the same issue with overdispersion.

Can we perform a likelihood ratio test on models that are not nested?

In GLM, why is it not appropriate to use an F test without other information on the distribution of Y?

Because it’s not known that the response is being modelled as normal.

Unless specified otherwise, what does the likelihood ratio chi squared test test? What df?

Null: all non-intercept coefficients are zero Alternative: at least one non-intercept coefficient is non-zero. Df= (# of parameters in null hyp) - (# of parameters in alternative)

What is the compound poisson-gamma distribution?

Tweedie

Which of the following models can be used to accommodate underdispersion? Negative binomial Zero-inflated Hurdle

Only hurdle model

If I want to model auto claims with poisson(Lambda), what model can I use if I want to model lambda as a continuous variable? A. Zero inflated B. Hurdle model C. Heterogeneity model

Only the heterogeneity model uses a continuous mixture. The other two are discrete.

When we want the variance to be greater than the mean of the response, what models could we use? (We are using poisson) Negative binomial Hurdle Zero inflated

All 3. Negative binomial: since this is a limiting case of poisson with variance being greater than the mean, this is suitable. Zero inflated: this model requires selecting a discrete distribution starting with 0, which is true of poisson. If poisson is selected, then the variance will exceed the mean. Hurdle: this model requires selecting a discrete distribution valid on integers starting with 1 (zero truncated poisson is a possibility) if zero truncated poisson is selected, then the variance will be greater than the mean.

Which of the following are true about GLMs? A. Inverse of the link function is the mean function. B. The choice of distribution plays an important role in making inferences. C. For distributions in the linear exponential family, the canonical link is the inverse of the mean function.

A. True. B. False It is the choice of the mean function and the variance function that drives the inference making. C. True.

Can we say that tweedie is a type of heterogeneity moDel?

No, heterogeneity are continuous mixtures. Tweedie is a compound distribution.

``` If we wish to model the aggregate auto claims for a specific block of business, what would be the most appropriate? Possion with exposures Negative binomial Zero inflated model MLR GLM with tweedie ```

Claim amounts are non-negative and modeled as a continuous variable. This takes out negative binomial, poisson, and zero-inflated models because they all have count response variables. MLR assumes a normally distributed response, that can take on any value. This eliminates MLR because claim amounts cannot be negative. Tweedie is appropriate, because it has a mixture of a discrete distribution at y=0, while being continuous for the remainder of the domain.

A canonical link sets theta equal to what?

3. Non-Linear Models Flashcards

(41 cards)