W5: GLM 2 Flashcards
What is the y variable for poisson distribution?
Discrete numeric, whole, and positive numbers
No negative integers
What type of distribution should be used for this RQ:
“Examining risk factors for the number of accidents someone gets into over a 12 month period”
Poisson
What type of distribution should be used for this RQ:
“Evaluating whether an intervention reduced the number of times someone
missed their medication in the last month”
Poisson
What type of distribution should be used for this RQ:
“Testing whether the total number of health care appointments over six months can be lowered by treating mental health”
Poisson
How many parameters does the Poisson distribution have and what are they called?
1 parameter: Lambda
Both the mean AND variance
What does the Poisson distribution look like when lambda gets higher (e.g when lambda = 10)?
More like a normal distribution
What assumption is violated for both Poisson and logistic regression?
Normality assumption
Why can’t we use linear regression for count outcomes, and Poisson instead?
Straight line is bad fit for only positive outcomes
What is the link function for Poisson distribution and what does it do?
Natural log (ln (lambda)
Transforms eta so it never goes below 0
Unbounds lambda on the left side (y axis) of the graph
Log of 0, ln(0) = negative infinity
After link transformation, what does the data fall between for Poisson and logistic distribution?
Negative infinity to positive infinity
i.e continuous unbounded outcome to apply to linear model
What is the variance if lambda is 0?
0
What does the inverse link function do for Poisson and logistic distribution?
Poisson: y axis (left side of graph) falls back to the original count scale (between 0 and 1)
Logistic: y axis falls back to probability scale (between 0 and 1)
What are the 3 assumptions of Poisson and logistic regression?
- Errors must be independent
- Assumes linear relationship on the link (natural log / logit) scale
- Requires large sample size (no dfs, so it’s for parameters to be normally distributed)
What argument must be added to glm() and testDistribution() for Poisson and logistic regression?
glm( y ~ x, data = d, family = poisson() )
or family = binomial()
testDistribution( d$awards, distr = “poisson”)
How do you interpret the estimate for the predictor using Poisson regression?
glm ( num_awards ~ math)
Each 1 unit higher math score is associated with x high…
log awards
Instead of interpreting Poisson regressions on log scale, what should we use instead and how do we get that?
Incident rate ratios (IRRs) by exponentiating regression coefficients.
What do IRRs indicate (Poisson)?
How many more times y will be for 1 unit change in x
i.e the ratio of how much y is expected to change in count numbers
If IRR = 4, base rate = 2, how many more times will outcome be for 1 unit change in predictor?
4 * 2 = 8
What does it mean if IRR or OR = 1?
What would the coeff value be on link (log / log odds) scale?
There is no change in number of times the outcome will be (1 x 1 base rate = 1)
or no change in number of time the odds of outcome (1 x 1 base odds = 1)
Coeff of 1 on IRR or OR scale = coeff of 0 on link scale
What 3 things should you not exponentiate?
p-values, z values, standard errors
What are 2 things you can exponentiate for poisson regression?
regression coefficients : exp(coef)
confidence intervals : exp(confint)
What argument do you have to add to visreg when you want to plot poisson or binary logistic regression on the original scale?
scale = “response”
What is the y outcome for binary logistic regression?
0 or 1
What type of regression do you use for this RQ:
What predicts whether someone will have major depression or not?
Binary logistic
What type of regression do you use for this RQ:
Does one treatment have a higher probability of patients remitting from major depression than another treatment?
Binary logistic
What type of regression do you use for this RQ:
What is the probability that a patient will be readmitted to the hospital within 30 days of discharge?
Binary logistic
What type of regression do you use for this RQ:
What predicts whether an individual will live or die before age 60?
Binary logitic
What type of regression do you use for this RQ:
If a bank gives a loan to someone, what is their probability of not being able to pay it back?
Binary logistic
What type of regression do you use for this RQ:
Do older adults have a higher probability of using CAM than younger adults?
Binary Logistic
What is the link function for logistic regression and what does it do?
Logit function
Transforms et so it never goes below 0 or above 1
Unbounds on both left and right side of graph
What distribution do logistic regressions follow and what is its parameters?
Bernoulli distribution
1 parameter (average probability that the event will occur i.e p or mu)
How many parameters do both Poisson and Bernoulli distributions have?
1
Separation is a problem that can arise from logistic regression. What does it mean?
When predictor perfectly predicts outcome / separate the outcome
E.g 0% that D appears in people without PTSD
Under what 2 situations would the issue of separation most often occur?
- When the outcome is rare
- When there is a small sample size
How do you resolve the issue of separation for logistic regression?
Remove predictors / collapse groups
R stores variables with few levels (e.g 0 or 1) as continuous.
What does the argument strict = FALSE in egltable() function do?
Should be treated as categorical variable
What does a significant chi-square test from egltable() output indicate for x and y?
x and y are not independent
How do you interpret the estimate for the predictor using Poisson regression?
glm ( stress_high ~ SE)
Each 1 unit higher SE is associated with x high…
log odds of being in high stress
Instead of interpreting logitic regressions on logit scale, what should we use instead and how do we get that?
Odd ratios (ORs) by exponentiating regression coefficients
What do ORs indicate (logistic)?
How many more times the odds of occurring the outcome will be for 1 unit change in predictor
If OR = 2, base odds = 0.9, 1 unit higher = ?
0.9 * 2 = 1.8 times the odds of outcome
Instead of using ORs, what should we convert the log odds scale to?
Probability scale
How do you determine 0 or 1 on visreg graph with y-axis (predicted probabilities) ranging from 0 to 1?
Above 0.5 = 1 / yes
Below 0.5 = 0 / no
What function do you use to convert ORs to probabilities? Output as a table of probabilities.
predict( mlog, type = “response”)
Different values of predictors have different probabilities. E.g SE score of 1 has much higher probability of being in high stress group than SE score of 4. What is this effect called?
Marginal effect
Instantaneous effect of change at 1 particular point on x scale
AKA tangent line / derivative of slope
What is the average marginal effect (AME) of probabilities?
What the average change in probability would be in outcome for 1 unit change in predictor
How do you calculate AME?
Calculate mean of the divided difference between original and new (added constant (h) ) probabilities by original constant (h)
Do all predictors influence the outcome for multiple regression?
Yes
What is the logit link function?
And what do they unbound?
ln(mu / 1- mu) / g(mu)
* ln part unbounds left side (never goes below 0), neg inifinity
* mu / 1- mu unbounds right side (never goes above 1), pos infinity
ORs higher than 1 = pos / neg relationship between your variables?
Positive relationship
ORs lower than 1 = pos / neg relationship between your variables?
Negative relationship
What are odds?
Probability of something happening / probability of something not happening.
E.g rolling a 6 on a 6 sided dice = 1/6 divided by 5/6
When do you calculate AME?
For continuous predictors in binary/logistic regression when you see an instantaneous change in outcome at a specific x value
E.g when the probability of being in high stress suddenly goes down between SE score of 3 and 4
What are deviance residuals?
Individual contribution of each observation to the overall model deviance.
A negative deviance residual represents what?
On average, observed outcome is lower than model predicted outcome
A positive deviance residual represents what?
On average, observed outcome is higher than model predicted outcome