8 Flashcards
What are the issues with distributed series models?
- The various lagged values of 𝑋 are likely to be severely multicollinear.
- No guarantee that the estimates 𝛽𝑠 will follow the smoothly declining pattern that economic theory would suggest.
- The degrees of freedom tend to decrease leading to more variability and less precision.
How we could solve the problem with distributed series models?
Instead of including lags of the independent variables include a lag of the dependent variable. This might create issues for a model where you are trying to predict how independent variables affect the dependent one. Hence, in cases when you want to predict the dependent variable with independent variables, it is better to simply limit the number of lags you use based on theoretical or other considerations.
What is LOESS?
Locally estimated scatterplot smoothing regression (LOESS) – an analysis method that created a smooth line through time plots and scatter plots. It divides regression analysis into a few lines
When and how to use LOESS
You use LOESS to:
See if there is an interesting relationship in a noisy data set.
Evaluating if there is a non-linear relationship (i.e., e.g. non-monotone trend).
Data exploration (i.e., to get a better feel of the data without any particular aim).
You do NOT use LOESS to calculate how one variable affects another.
Problems with time series models
Things change as time passes, so we need to be mindful of this and try to remove such “trends” as much as possible to truly evaluate how variables affect one another.
What is spurious relationship?
Spurious correlation – a strong relationship between variables that is not caused by a real underlying causal relationship, this relationship is “accidental”.
What is correlogram?
Correlogram – a data visualisation technique that shows how a variable correlates with its lags.
How correlograms are drawn?
Autocorrelation Function (ACF) indicates how a variable correlates with different lags of itself. Partial Autocorrelation Function (PACF) shows the same thing but it also controls for previous 𝑘 lags when calculating the correlation between the current value 𝑡 and a lagged value 𝑘+1 where 𝑘<𝑡.
How we can find out if it is stationarity or non-stationarity in ACF?
If autocorrelation is slowly decreasing - non-stationarity
If autocorrelation remains the same for a long period of lag - stationarity
What is unit root?
It is a measure which is better than correlograms and it is a stable difference between 𝑌_𝑡−𝑌_(𝑡−1) implying there is a trend, which also implies we have a non-stationary variable.
How to check a unit root?
Make sure that your numeric variables do not have a unit root
To do that you use two, or more if needed, tests:
Augmented Dickey-Fuller (ADF) test – tests if a variable is stationary using a combination of different lags. It predominantly focuses on stationarity in means, and with it, you want to reject the null hypothesis.
Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test – similar to ADF, but it focuses more on the stationarity of the variance. With this test, you do not want to reject the null hypothesis.
How we can fix non-stationarity?
First, you can try to build a regression model and see if it is cointegrated – do the variables in the model flow to some sort of equilibrium and hence non-stationarity of variables is not a problem.
Secondly, transform your non-stationary data by taking the first difference of a variable. If you still have a non-stationarity issue, you can take the first difference (in raw terms, percentages, logs, etc.) of the difference, and so on, until the variables become stationary
How we can check for cointegration?
- Build a regression equation
- Extract the residuals from it
- Import the residuals into a statistical software of your choice
- Check the unit root of the residuals. If you have a unit root (i.e. non-stationary), you do not have cointegration, if you do not have a unit root, you have cointegration.
How the proffesor would solve non-stationarity?
- Check if your variables are stationary.
- If some of them are not, transform them using the first difference (or transform all numeric variables for simplicity’s sake).
3.If the first difference fixes the issue, just build a regression equation and interpret it as if you would any regression.
If the first difference did not fix this issue, try different types of differences (e.g. unit difference, percentage difference, log difference). If this does not help, build your regression with the “best” differences and check for cointegration. If you have cointegration, just continue, but if you have an issue, only then try to use the first difference of the first difference.
What are time-series assumptions?
- The regression model is linear, is correctly specified, and has an additive error term
- The error term has a zero population mean.
- All explanatory variables are uncorrelated with the error term.
- The error term has a constant variance (homoscedasticity).
- No perfect multicollinearity
- Observations of the error term are uncorrelated with each other (no serial correlation).
- Residuals are normally distributed