Violations of CLRM Flashcards
What is Autocorrelation
Autocorrelation, also known as serial correlation, occurs when the residuals (error terms) in a regression model are correlated across different time periods. This means that the error term of one observation is influenced by the error term of a previous observation. Autocorrelation is primarily an issue in time-series data, where observations are recorded sequentially over time.
Mathematically, autocorrelation is present if:
Cov(Ut, Ut-1) is not = 0
Forms of Autocorrelation
Positive Autocorrelation (ρ>0): Errors from one period tend to be followed by errors in the same direction. This is common in economic time series (e.g., GDP, inflation).
Negative Autocorrelation (ρ<0): Errors from one period tend to be followed by errors in the opposite direction. This occurs in alternating patterns, such as business cycle fluctuations.
Causes of Autocorrelation
OMIID
- Omitted Variables: If important explanatory variables are missing, their effect may spill over into the error term, creating correlation.
- Misspecification of the Model: Using an incorrect functional form or excluding lagged variables can introduce autocorrelation.
- Inertia or Persistence in Data: Economic and financial data often exhibit trends or cycles, leading to serial correlation in errors.
- Incorrect Measurement of Variables: Errors in data collection can introduce patterns in residuals.
- Data Manipulation Issues: When data is interpolated or smoothed, it can introduce artificial correlation.
Consequences of Autocorrelation
I^3O/U
1.Inefficient OLS Estimates: While OLS estimators remain unbiased, they are no longer the Best Linear Unbiased Estimators (BLUE) because their variance is underestimated.
2.Inconsistent Standard Errors: This leads to misleading hypothesis tests and confidence intervals, making the t-tests and F-tests unreliable.
3.Inflated R-Squared: The model may appear to fit the data well, even when it does not.
4.Over- or Underestimation of Coefficients: Serial correlation in residuals can distort coefficient estimates, affecting policy implications.
What are the types of autocorrelation?
- Positive Autocorrelation – Errors follow the same sign, creating patterns. 2. Negative Autocorrelation – Errors alternate in sign, causing frequent fluctuations.
How is autocorrelation mathematically expressed?
Corr(u_t, u_{t-k}) ≠ 0 for k ≠ 0, where u_t is the error term at time t.
What are the tests for autocorrelation?
- Durbin-Watson Test – Detects first-order autocorrelation, d ≈ 2 means no autocorrelation.
- Breusch-Godfrey Test – General test for higher-order autocorrelation. 3. Graphical Methods – Residual plots and Autocorrelation Function (ACF).
- Runs Test – Non-parametric test checking for randomness in residuals.
What does the Durbin-Watson test measure?
First-order autocorrelation in regression residuals.
How is the Durbin-Watson statistic interpreted?
d ≈ 2 → No autocorrelation. d < 2 → Positive autocorrelation. d > 2 → Negative autocorrelation.
What does the Breusch-Godfrey test detect?
Higher-order autocorrelation using an auxiliary regression approach.
What graphical methods detect autocorrelation?
Residual plots and the Autocorrelation Function (ACF).
What does the Runs Test check?
It tests residual randomness to detect autocorrelation.
What are the remedial measures for autocorrelation?
GLADLA
- Generalized Least Squares (GLS) – Transforms data to correct autocorrelation. Estimates and removes serial correlation before applying OLS.
2.Logarithmic transformation: This can help if the relationship between variables is non-linear.
3.Adding missing variables: Including relevant variables not initially considered can capture hidden dependencies and reduce autocorrelation.
4.Differencing: Subtracting subsequent values from the original time series removes constant trends and can alleviate autocorrelation.
5.Including lagged variables: Adding past values of the dependent variable as predictors can account for temporal trends and autocorrelation.
6.Using ARIMA models: These models specifically address autoregressive and integrated moving average processes, making them suitable for time series data with autocorrelation.
How do GLS and Cochrane-Orcutt correct autocorrelation?
They estimate and remove serial correlation before applying OLS.
How do lagged variables or differencing help with autocorrelation?
They remove persistence in data trends.
How does an autoregressive (AR) model address autocorrelation?
It explicitly models serial correlation in errors.
What is Heteroscedasticity in Econometrics?
Heteroscedasticity occurs when the variance of the error term in a regression model is not constant across observations.
What is the Nature of Heteroscedasticity?
In a homoscedastic model, error variance remains constant. In a heteroscedastic model, error variance changes with an independent variable or over time.
What are the Causes of Heteroscedasticity?
Pizza Often Makes Vacation Taste Cooler
1.Presence of Outliers: Extreme values in the dataset can distort the residual variance.
2.Omitted Variable Bias: When an important explanatory variable is left out, its effect may manifest in the residuals, causing non-constant variance.
3.Mispecification of regression model: If the true relationship between dependent and independent variables is non-linear but a linear model is used, the variance of residuals may change with different levels of the explanatory variable.
4.Varied Scale of Measurement: When data involve variables measured in different units (e.g., incomes of both large corporations and small businesses), the variance of errors may differ significantly.
5.Time-Series Effects: In economic data, variance may change over time due to economic cycles, inflation, or structural breaks.
6.Cross-Sectional Heterogeneity: In datasets containing individuals, firms, or countries with significantly different characteristics, heteroscedasticity can emerge due to underlying differences in behavior.
What are the Consequences of Heteroscedasticity?
IBUI
1.Inefficiency of OLS Estimators:
The OLS estimators remain unbiased but are no longer the Best Linear Unbiased Estimators (BLUE) because they do not have minimum variance (violating the Gauss-Markov theorem).
2.Biased standard errors:
Heteroscedasticity can cause the estimated variances of regression coefficients to be biased. This can lead to biased standard errors, test statistics, and confidence intervals.
3.Unreliable hypothesis tests:
Because of biased standard errors, hypothesis tests may be unreliable. For example, t-statistics may appear to be more significant than they actually are.
4.Incorrect inferences:
Incorrect inferences from data can lead to flawed business or research decisions.
What are the Tests for Heteroscedasticity?
- Graphical Methods (Informal Tests)
i)Residual Plot: A scatterplot of residuals (εhat) against predicted values (𝑌hat). A systematic pattern (e.g., cone shape) suggests heteroscedasticity.
ii)Residuals vs. Independent Variables: If residual variance changes as an explanatory variable increases, heteroscedasticity may be present. - Formal Statistical Tests
i)Breusch-Pagan Test
ii)White Test
iii)Goldfeld-Quandt Test
iv) Park test
v) Glejser test
What are the Remedial Measures for Heteroscedasticity?
Rob Goes Without Tea More
1.Robust Standard Errors
Use heteroscedasticity-robust standard errors (e.g., White’s standard errors) to obtain correct statistical inference without modifying the model.
2.Generalized Least Squares (GLS) and Feasible GLS (FGLS)
GLS transforms the model to stabilize variance by weighting observations appropriately.
FGLS is used when the precise form of heteroscedasticity is unknown but can be estimated.
3.Weighted Least Squares (WLS)
Assigns weights to observations inversely proportional to their estimated variance, giving more weight to observations with smaller variance.
4.Transforming Variables
Log Transformation: Applying a log transformation to the dependent variable (Y) often reduces heteroscedasticity, especially in models involving income or expenditure.
Square Root Transformation: Helps stabilize variance in count data models.
5.Model Specification Corrections
Adding omitted variables that might be causing non-constant variance.
Using interaction terms if the variance pattern suggests a non-linear relationship.
Nature of Multicollinearity
Multicollinearity refers to a situation in which two or more independent variables in a regression model are highly correlated, meaning they provide redundant information. This violates the assumption of no perfect multicollinearity in the classical linear regression model (CLRM), leading to unreliable estimates of the regression coefficients.
Mathematically, multicollinearity occurs when:
𝑋𝑗 = 𝛼1𝑋1 + 𝛼2𝑋2 + … + 𝛼𝑘𝑋𝑘 + 𝑢
𝑋𝑗 = 𝛼1𝑋1 + 𝛼2𝑋2 + … + 𝛼𝑘𝑋𝑘 ( Perfect Multicollinearity)
where one independent variable (𝑋𝑗) can be expressed as a linear combination of other independent variables.
Causes of Multicollinearity
HIPDD
1.Inclusion of Highly Correlated Variables: When two or more independent variables measure similar phenomena (e.g., GDP and income).
2.Insufficient Data Variation: When sample data does not vary enough (e.g., due to a short time period).
3.Overuse of Polynomial or Interaction Terms: Including squared or interaction terms in the model can introduce artificial correlation.
4.Data Collection Errors: Inaccurate or missing data can inflate correlations between variables.
5.Dummy Variable Trap: Using all categories of a categorical variable instead of omitting one as a reference category.
6.Aggregation of Data: Combining similar groups in a way that increases correlation among predictors.