04 Logistic Regression Flashcards
What are generalized linear models made up of?
- A distribution of the outcome variable Y– Poisson, Binomial, Normal
- A linear prediction † = ßo +…etc
- Link function connection E(Y|X) = N.
g(N) = †
Why do we need a link function?
Handles Non-Linearity: Many response variables do not have a linear relationship with predictors. The link function allows GLMs to capture these relationships.
Maps Outcomes to an Appropriate Range:
For a binary outcome (e.g., logistic regression), the probability μ must be between 0 and 1. The logit function ensures this constraint.
For count data (e.g., Poisson regression), the expected count μ must be positive. The log function ensures this.
For normal linear regression models: μ =Xß
Explain issues arising from using count variables for normal linear regressions.
It may lead to predicting negative values which are impossible, it may learn from highly skewed data as count datasets are often characterised by skew, violating the normality assumptions of OLS.
Interpreting Coefficients
In Poisson Regression, 𝑓 is typically conceptualized as a rate…
Like logit, Poisson models are non-linear
- coefficients don’t have a simple linear interpretation
Like logit, model has a log form; exponentiation aids interpretation: exponentiated coefficients are multiplicative–>explain the mathematical meaning - analogous to odds ratios … but called “incidence rate ratios”
- positive coefficients indicate higher rate; negative = lower rate
ln (𝜇(𝑥𝑖)) = 𝑥𝑖 ′𝛽
ln (𝜇(𝑥𝑖+1)) = (𝑥𝑖+1) ′𝛽
𝛽 = ln (𝜇(𝑥𝑖+1)/𝜇(𝑥𝑖))
Assumption of Poisson regression:
Consequences:
Mean and variance are the same, often not met in real data, as variance > mean ==> overdispersion.
Consequences: Underestimate SEs, Overconfidence in results, rejecting Ho when you should not.
Zero-Inflation
If outcome variable has many zero values, it tends to be highly skewed.
But, sometimes you have LOTS of zeros. Even Negative binomial regression isn’t sufficient.
Model under-predicts zeros, doesn’t fit well.
Examples:
* Number of violent crimes committed by a person in a year * Number of of wars a country fights per year
* Number of of foreign subsidiaries of firms
How to address this issue?
Zero-Inflation
Logic of zero-inflated models: assume two types of groups in your sample * Type A: Always zero – no probability of non-zero value
* Type ~A: Non-zero chance of positive count value
− probability is variable, but not zero
1. Use logit to model group membership (A or ~A)
2. Use Poisson or NegativeBinomial regression to model counts for those in group ~A
3. Compute probabilities based on those results
you should be careful about:
- n > 500=fine;n < 100can be worrisome
− r….
- model specification / omitted variable bias
- multicollinearity
- outliers
results aren’t necessarily wrong if n < 100, but less reliable * plus ~10 cases per independent variable