Week 6 Assumptions Part III and Multicollinearity Flashcards
Homoskedasticity
An assumption of equal or similar variances in different groups being compared
This is an important assumption of parametric statistical tests because they are sensitive to any dissimilarities. Uneven variances in samples result in biased and skewed test results.
Heteroskedasticity
definition and consequence
Happens when the standard deviations of a predicted variable, monitored over different values of an independent variable or as related to prior time periods, are non-constant.
consequence:
- Inefficiency
- SE is wrong (biased), resulting in unreliable confidence intervals and hypotheses tests.
Causes of heteroskedasticity
- Necessary but not sufficient conditions
- Measurement improves with independent variables
- Presence of outliers (An outlier is an observation that lies an abnormal distance from other values in a random sample from a population)
- Specification error (especially omission of relevant variable and nonadditivity)
Omission
Samples with invalid data are discarded from further analysis
Omitted-variable bias (OVB)
Occurs when a statistical model leaves out one or more relevant variables
Weighted Least Squares (WLS)
Weighted least squares is a version of least squares where you give importance (or weight) to some data point over others. In WLS, points with higher weights are considered more reliable, so they have more influence on finding the best-fitting line or model.
In presence of heteroskedasticity, WLS is BLUE
Autocorrelation
Measures the relationship between a variable’s current value and it’s past values.
An autocorrelation of +1 represents a perfect positive correlation, while an autocorrelation of -1 represents a perfect negative correlation.
Cross-sectional
Observations are recorded for different individuals/objects at one time
Time series
Observations are recorded for the same individual/object across time
Time series, cross sectional (or repeated cross-sections)
Observations are recorded across time but for different individuals each time
Panel
Observations are recorded for the same individuals across time
Serial autocorrelation
The correlation between neighbouring observations ordered in time
Spatial autocorrelation
The correlation between neighbouring observations ordered in space
Example: crime rates in neighboring city wards
Multicollinearity
Occurs when two or more predicotr variables in a regression model are highly correlated, making it hard to determine their individual effects on the outcome. This can lead to unstable estimates and reduce the reliability of the model
Perfect multicollinearity
When one independent variable is an exact linear combination of one or more other independent variables. This makes it impossible for the model to estimate unique coefficients for those variables, as they provide the same information
- Results in R2of 1, but indeterminate model (cannot be estimated!)
- Does not violate Gauss-Markov, estimates are still BLUE, but variances are high