Lecture 8-10 Flashcards
Dummy variable:
Are binary variables coded as 1 or 0
Dummy variable trap:
Is a situation where the dummy variables are perfectly correlated. Always equals 1, because there is no base group. To fix this, leave out one category as the base group.
Base group:
- The reference category against which other groups are compared in regression.
- Example: analyzing effect of shirt color on the price of a shirt. Green is the base group
Interaction term:
Combining two variables (dummies) to study how one variable’s effect depends on the other
LPM (linear probability model) limitations:
o Can produce probabilities outside the range of 0 to 1.
- (no practical or logical meaning, as probabilities can only describe values between 0 (impossible) and 1 (certain).)
The LPM assumes a straight-line relationship, which fails to capture this behavior.
- In reality, probabilities often follow an S-shaped curve:
Binary response variable:
- A Binary Response Variable is a dependent variable in a regression or statistical model that can take on only two possible outcomes, typically coded as 1 and 0.
Logit model
Is used to predict probabilities for a binary response variable (dependent variable). It ensures the predicted probabilities stay within the valid range of 0 and 1.
Probit model
The Probit Model is a regression model used to predict probabilities for a binary response variable. It assumes that the probabilities follow a standard normal distribution (bell-shaped curve) and is suitable when normality is a reasonable assumption for the underlying relationship.
Latent variable approach
- A latent variable is a hidden or unobservable variable that influences observed data. Not directly measurable but are inferred through mathematical models.
- The Latent Variable Approach assumes that an unobserved, continuous variable determines the observed binary outcome.
Maximum likelihood estimation (MLE):
maximizing the likelihood that the observed data occurred under specific parameters. It’s like finding the most probable explanation for the data.
Goodness of fit measures:
- Goodness-of-Fit Measures evaluate how well a model fits the data, using metrics like the percentage of correct predictions and pseudo R2, to show how well the model explains the outcome.
Average partial effects (APE):
Is the average change in the probability of an event occurring when an independent variable changes by one unit, calculated across all observations. It simplifies the interpretation of nonlinear models like Logit and Probit.
Model comparisons: Logit vs probit:
- Logit and Probit models work similarly, but Logit is easier to interpret in terms of odds, while Probit assumes a normal distribution and scales coefficients smaller. Both produce similar predictions.
Simple Example:
If you predict whether someone buys a product: - In a Logit model, an income coefficient of 0.80.8 means income has a strong positive effect.
- In a Probit model, the same effect might be shown as 0.50.5, but it represents the same relationship, just on a different scale.
Covariance and correlation:
Covariance: indicates the direction of the linear relationship between two variables
Correlation: measures both strenght and direction of the linear relationship between two variables
Log-likelihood function:
Calculate how likely the observed data is under a specific statistical model.
Homoscedasticity
- Homoscedasticity refers to a situation in regression analysis where the variance of the residuals (errors) is constant across all levels of the independent variable(s). It’s an important assumption in linear regression.