W5: GLM 2 Flashcards

Question 1

Q

What is the y variable for poisson distribution?

Answer

A

Discrete numeric, whole, and positive numbers
No negative integers

Question 2

Q

What type of distribution should be used for this RQ:
“Examining risk factors for the number of accidents someone gets into over a 12 month period”

Question 3

Q

What type of distribution should be used for this RQ:
“Evaluating whether an intervention reduced the number of times someone
missed their medication in the last month”

Question 4

Q

What type of distribution should be used for this RQ:
“Testing whether the total number of health care appointments over six months can be lowered by treating mental health”

Question 5

Q

How many parameters does the Poisson distribution have and what are they called?

Answer

A

1 parameter: Lambda
Both the mean AND variance

Question 6

Q

What does the Poisson distribution look like when lambda gets higher (e.g when lambda = 10)?

Answer

A

More like a normal distribution

Question 7

Q

What assumption is violated for both Poisson and logistic regression?

Answer

A

Normality assumption

Question 8

Q

Why can’t we use linear regression for count outcomes, and Poisson instead?

Answer

A

Straight line is bad fit for only positive outcomes

Question 9

Q

What is the link function for Poisson distribution and what does it do?

Answer

A

Natural log (ln (lambda)
Transforms eta so it never goes below 0
Unbounds lambda on the left side (y axis) of the graph
Log of 0, ln(0) = negative infinity

Question 10

Q

After link transformation, what does the data fall between for Poisson and logistic distribution?

Answer

A

Negative infinity to positive infinity
i.e continuous unbounded outcome to apply to linear model

Question 11

Q

What is the variance if lambda is 0?

Question 12

Q

What does the inverse link function do for Poisson and logistic distribution?

Answer

A

Poisson: y axis (left side of graph) falls back to the original count scale (between 0 and 1)
Logistic: y axis falls back to probability scale (between 0 and 1)

Question 13

Q

What are the 3 assumptions of Poisson and logistic regression?

Answer

A

Errors must be independent
Assumes linear relationship on the link (natural log / logit) scale
Requires large sample size (no dfs, so it’s for parameters to be normally distributed)

Question 14

Q

What argument must be added to glm() and testDistribution() for Poisson and logistic regression?

Answer

A

glm( y ~ x, data = d, family = poisson() )
or family = binomial()
testDistribution( d$awards, distr = “poisson”)

Question 15

Q

How do you interpret the estimate for the predictor using Poisson regression?
glm ( num_awards ~ math)
Each 1 unit higher math score is associated with x high…

Answer

A

log awards

Question 16

Q

Instead of interpreting Poisson regressions on log scale, what should we use instead and how do we get that?

Answer

A

Incident rate ratios (IRRs) by exponentiating regression coefficients.

Question 17

Q

What do IRRs indicate (Poisson)?

Answer

A

How many more times y will be for 1 unit change in x
i.e the ratio of how much y is expected to change in count numbers

Question 18

Q

If IRR = 4, base rate = 2, how many more times will outcome be for 1 unit change in predictor?

Answer

A

4 * 2 = 8

Question 19

Q

What does it mean if IRR or OR = 1?
What would the coeff value be on link (log / log odds) scale?

Answer

A

There is no change in number of times the outcome will be (1 x 1 base rate = 1)
or no change in number of time the odds of outcome (1 x 1 base odds = 1)
Coeff of 1 on IRR or OR scale = coeff of 0 on link scale

Question 20

Q

What 3 things should you not exponentiate?

Answer

A

p-values, z values, standard errors

Question 21

Q

What are 2 things you can exponentiate for poisson regression?

Answer

A

regression coefficients : exp(coef)
confidence intervals : exp(confint)

Question 22

Q

What argument do you have to add to visreg when you want to plot poisson or binary logistic regression on the original scale?

Answer

A

scale = “response”

Question 23

Q

What is the y outcome for binary logistic regression?

Question 24

Q

What type of regression do you use for this RQ:
What predicts whether someone will have major depression or not?

Answer

A

Binary logistic

Question 25

Q

What type of regression do you use for this RQ:
Does one treatment have a higher probability of patients remitting from major depression than another treatment?

Answer

A

Binary logistic

Question 26

Q

What type of regression do you use for this RQ:
What is the probability that a patient will be readmitted to the hospital within 30 days of discharge?

Answer

A

Binary logistic

Question 27

Q

What type of regression do you use for this RQ:
What predicts whether an individual will live or die before age 60?

Answer

A

Binary logitic

Question 28

Q

What type of regression do you use for this RQ:
If a bank gives a loan to someone, what is their probability of not being able to pay it back?

Answer

A

Binary logistic

Question 29

Q

What type of regression do you use for this RQ:
Do older adults have a higher probability of using CAM than younger adults?

Answer

A

Binary Logistic

Question 30

Q

What is the link function for logistic regression and what does it do?

Answer

A

Logit function
Transforms et so it never goes below 0 or above 1
Unbounds on both left and right side of graph

Question 31

Q

What distribution do logistic regressions follow and what is its parameters?

Answer

A

Bernoulli distribution
1 parameter (average probability that the event will occur i.e p or mu)

Question 32

Q

How many parameters do both Poisson and Bernoulli distributions have?

Question 33

Q

Separation is a problem that can arise from logistic regression. What does it mean?

Answer

A

When predictor perfectly predicts outcome / separate the outcome
E.g 0% that D appears in people without PTSD

Question 34

Q

Under what 2 situations would the issue of separation most often occur?

Answer

A

When the outcome is rare
When there is a small sample size

Question 35

Q

How do you resolve the issue of separation for logistic regression?

Answer

A

Remove predictors / collapse groups

Question 36

Q

R stores variables with few levels (e.g 0 or 1) as continuous.
What does the argument strict = FALSE in egltable() function do?

Answer

A

Should be treated as categorical variable

Question 37

Q

What does a significant chi-square test from egltable() output indicate for x and y?

Answer

A

x and y are not independent

Question 38

Q

How do you interpret the estimate for the predictor using Poisson regression?
glm ( stress_high ~ SE)
Each 1 unit higher SE is associated with x high…

Answer

A

log odds of being in high stress

Question 39

Q

Instead of interpreting logitic regressions on logit scale, what should we use instead and how do we get that?

Answer

A

Odd ratios (ORs) by exponentiating regression coefficients

Question 40

Q

What do ORs indicate (logistic)?

Answer

A

How many more times the odds of occurring the outcome will be for 1 unit change in predictor

Question 41

Q

If OR = 2, base odds = 0.9, 1 unit higher = ?

Answer

A

0.9 * 2 = 1.8 times the odds of outcome

Question 42

Q

Instead of using ORs, what should we convert the log odds scale to?

Answer

A

Probability scale

Question 43

Q

How do you determine 0 or 1 on visreg graph with y-axis (predicted probabilities) ranging from 0 to 1?

Answer

A

Above 0.5 = 1 / yes
Below 0.5 = 0 / no

Question 44

Q

What function do you use to convert ORs to probabilities? Output as a table of probabilities.

Answer

A

predict( mlog, type = “response”)

Question 45

Q

Different values of predictors have different probabilities. E.g SE score of 1 has much higher probability of being in high stress group than SE score of 4. What is this effect called?

Answer

A

Marginal effect
Instantaneous effect of change at 1 particular point on x scale
AKA tangent line / derivative of slope

Question 46

Q

What is the average marginal effect (AME) of probabilities?

Answer

A

What the average change in probability would be in outcome for 1 unit change in predictor

Question 47

Q

How do you calculate AME?

Answer

A

Calculate mean of the divided difference between original and new (added constant (h) ) probabilities by original constant (h)

Question 48

Q

Do all predictors influence the outcome for multiple regression?

Question 49

Q

What is the logit link function?
And what do they unbound?

Answer

A

ln(mu / 1- mu) / g(mu)
* ln part unbounds left side (never goes below 0), neg inifinity
* mu / 1- mu unbounds right side (never goes above 1), pos infinity

Question 50

Q

ORs higher than 1 = pos / neg relationship between your variables?

Answer

A

Positive relationship

Question 51

Q

ORs lower than 1 = pos / neg relationship between your variables?

Answer

A

Negative relationship

Question 52

Q

What are odds?

Answer

A

Probability of something happening / probability of something not happening.
E.g rolling a 6 on a 6 sided dice = 1/6 divided by 5/6

Question 53

Q

When do you calculate AME?

Answer

A

For continuous predictors in binary/logistic regression when you see an instantaneous change in outcome at a specific x value
E.g when the probability of being in high stress suddenly goes down between SE score of 3 and 4

Question 54

Q

What are deviance residuals?

Answer

A

Individual contribution of each observation to the overall model deviance.

Question 55

Q

A negative deviance residual represents what?

Answer

A

On average, observed outcome is lower than model predicted outcome

Question 56

Q

A positive deviance residual represents what?

Answer

A

On average, observed outcome is higher than model predicted outcome