7A Flashcards
What are the main topics of this lecture?
- Linear probability models
- Logistic regression models
- Effect size in logistic regression
- Non-binary categorical dependent variables
What are the two types of categorical variables?
- Binary categorical variables – Two categories (e.g., Voted = 1, Did not vote = 0).
- Non-binary categorical variables – More than two categories (e.g., religion: Catholic, Protestant, Muslim).
Why should the dependent variable (Y) in linear regression be continuous?
Linear regression assumes Y follows a continuous distribution, making coefficient interpretation meaningful.
What is a linear probability model (LPM)?
A linear regression model where the dependent variable is binary (coded as 0 or 1).
What are the problems with linear probability models?
- The model assumes linearity, which may not hold for binary outcomes.
- It can predict probabilities outside the range of 0 to 1.
When are linear probability models less problematic?
- When the mean of Y is close to 0.5.
- When independent variables are binary or have a limited range.
How is the regression coefficient interpreted in a linear probability model?
As the predicted change in probability that Y = 1 for a one-unit increase in X.
Why is R² difficult to interpret in a linear probability model?
The binary nature of Y means points do not lie close to a regression line, making the R² less meaningful.
What should you do when using a linear probability model?
Compare results with logistic regression to check for consistency.
What is logistic regression?
A statistical model that estimates the probability of a binary outcome (Y = 1 or 0) using a non-linear function.
Why use logistic regression instead of a linear probability model?
- Ensures predictions stay between 0 and 1.
- Handles non-linearity in probability changes.
- Provides odds ratios for easier interpretation.
How does logistic regression estimate coefficients?
It uses maximum likelihood estimation (MLE) instead of OLS regression.
What is maximum likelihood estimation (MLE)?
A process where the model searches for parameters that maximize the likelihood of observing the given data.
What is the Wald test in logistic regression?
A statistical test used to determine if a regression coefficient is significantly different from zero (similar to the t-test in linear regression).
What does the R² value mean in logistic regression?
There are two types of R² values in logistic regression that indicate how much variation in Y is explained by the model.
What is the Chi² test in logistic regression?
A test that evaluates whether all independent variables combined explain any variation in Y.
How do effect sizes differ between linear and logistic regression?
In linear regression, effect size is measured by standardized coefficients. In logistic regression, standardized coefficients do not exist, so we use odds ratios.
What are odds ratios in logistic regression?
They measure how much the odds of Y = 1 change with a one-unit increase in X.
How are odds ratios calculated?
The odds ratio for an independent variable is Exp(B) in SPSS output.
What does an odds ratio of 2 mean?
The odds of Y = 1 double for a one-unit increase in X.
What does an odds ratio of 0.5 mean?
The odds of Y = 1 are halved for a one-unit increase in X.
What does an odds ratio of 1 mean?
There is no effect of X on Y.
How do odds ratios behave for negative effects?
A strong negative effect produces an odds ratio less than 1, which is the inverse of a corresponding positive effect.
How should odds ratios NOT be interpreted?
Odds ratios should not be confused with probability ratios.
What is an example of an odds ratio interpretation?
If the odds ratio for distrust in politicians is 0.564, it means that for a one-unit increase in distrust, the number of Yes-voters for every No-voter is almost cut in half.
How can logistic regression effect sizes be visualized?
- Using odds ratios.
- Making a probability graph to show how Y changes with X.
How should non-binary categorical dependent variables be handled?
They can be analyzed using multinomial logistic regression, but a simpler approach is to create binary dummy variables for each category.
How does binary recoding of non-binary dependent variables work?
Instead of using vote choice as the dependent variable, create separate binary variables for each party: Voted for this party = 1, all others = 0.