04 Logistic Regression Flashcards

1
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are generalized linear models made up of?

A
  1. A distribution of the outcome variable Y– Poisson, Binomial, Normal
  2. A linear prediction † = ßo +…etc
  3. Link function connection E(Y|X) = N.
    g(N) = †
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why do we need a link function?

A

Handles Non-Linearity: Many response variables do not have a linear relationship with predictors. The link function allows GLMs to capture these relationships.
Maps Outcomes to an Appropriate Range:

For a binary outcome (e.g., logistic regression), the probability μ must be between 0 and 1. The logit function ensures this constraint.

For count data (e.g., Poisson regression), the expected count μ must be positive. The log function ensures this.

For normal linear regression models: μ =Xß

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain issues arising from using count variables for normal linear regressions.

A

It may lead to predicting negative values which are impossible, it may learn from highly skewed data as count datasets are often characterised by skew, violating the normality assumptions of OLS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Interpreting Coefficients
In Poisson Regression, 𝑓 is typically conceptualized as a rate…
Like logit, Poisson models are non-linear

  • coefficients don’t have a simple linear interpretation
    Like logit, model has a log form; exponentiation aids interpretation: exponentiated coefficients are multiplicative–>explain the mathematical meaning
  • analogous to odds ratios … but called “incidence rate ratios”
A
  • positive coefficients indicate higher rate; negative = lower rate

ln (𝜇(𝑥𝑖)) = 𝑥𝑖 ′𝛽
ln (𝜇(𝑥𝑖+1)) = (𝑥𝑖+1) ′𝛽
𝛽 = ln (𝜇(𝑥𝑖+1)/𝜇(𝑥𝑖))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Assumption of Poisson regression:
Consequences:

A

Mean and variance are the same, often not met in real data, as variance > mean ==> overdispersion.
Consequences: Underestimate SEs, Overconfidence in results, rejecting Ho when you should not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Zero-Inflation
If outcome variable has many zero values, it tends to be highly skewed.
But, sometimes you have LOTS of zeros. Even Negative binomial regression isn’t sufficient.
Model under-predicts zeros, doesn’t fit well.

Examples:
* Number of violent crimes committed by a person in a year * Number of of wars a country fights per year
* Number of of foreign subsidiaries of firms

How to address this issue?

A

Zero-Inflation
Logic of zero-inflated models: assume two types of groups in your sample * Type A: Always zero – no probability of non-zero value
* Type ~A: Non-zero chance of positive count value
− probability is variable, but not zero
1. Use logit to model group membership (A or ~A)
2. Use Poisson or NegativeBinomial regression to model counts for those in group ~A
3. Compute probabilities based on those results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

you should be careful about:

  • n > 500=fine;n < 100can be worrisome
    − r….
A
  • model specification / omitted variable bias
  • multicollinearity
  • outliers

results aren’t necessarily wrong if n < 100, but less reliable * plus ~10 cases per independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly