4A Flashcards
Why are assumptions necessary in regression analysis?
Assumptions ensure that OLS provides unbiased and accurate estimates of regression coefficients and standard errors.
What happens when an assumption is violated?
Regression coefficients and/or standard errors may become biased, leading to incorrect conclusions, such as false significance.
What are the four key assumptions in OLS regression?
- Homoscedasticity (constant variance of residuals).
- Independent observations (no clustering in data).
- No large outliers (extreme values can distort results).
- Normally distributed residuals (only relevant for small samples).
What is homoscedasticity in OLS regression?
The variance of the residuals (errors) should be constant across all levels of X.
What is heteroscedasticity, and why is it a problem?
Heteroscedasticity occurs when residual variance depends on X. This leads to biased standard errors, making p-values unreliable.
How do you check for heteroscedasticity?
Inspect scatterplots of residuals vs. predicted values or use White’s test in SPSS.
How can heteroscedasticity be corrected?
Use heteroscedasticity-robust standard errors, which correct standard errors without changing coefficients.
What does the assumption of independent observations require?
Each observation must provide new information, meaning data points should not be correlated or clustered.
What happens if observations are not independent?
Standard errors become too small, making p-values unreliable and increasing the risk of Type I errors (false positives).
How do you check if observations are independent?
Ensure one observation per unit, such as one person or one country. Check if data was collected from multiple sources, such as different countries.
What are the two solutions for non-independent observations?
- Use cluster-robust standard errors, for example, by country.
- Add control variables for the clustering factor, such as country fixed effects.
What is an outlier in regression?
An extreme observation that deviates significantly from other data points, such as a z-score above +3 or below -3.
Why can outliers be problematic?
Outliers distort coefficients and standard errors, making results unreliable. They can overwhelm the rest of the data.
How can you check for outliers?
Compute z-scores for all variables and look for values greater than +3 or less than -3.
What are the two approaches for dealing with outliers?
- Run regression with and without outliers to see if results change.
- If results differ significantly, justify whether to include or exclude them.
What does the assumption of normal residuals mean?
Residuals (errors) should follow a normal distribution, but only in small samples (n < 30).
Why is normality of residuals important only in small samples?
The Central Limit Theorem ensures that with large samples, standard errors are reliable even if residuals are not normal.
How can you test if residuals are normally distributed?
Use a histogram to check if residuals follow a bell curve or run a Shapiro-Wilk test or Kolmogorov-Smirnov test. If p < 0.05, residuals are not normal.
What are the two ways to fix non-normal residuals?
- Increase sample size to more than 30.
- Use alternative estimation methods, which are not covered in this course.