Time-Series Data Concepts Flashcards
Characteristics of Time Series Data
Time series data is characterized by its temporal ordering. The outcome of variables is still random. Te variables are usually highly correlated with each other. By definition, time series are not obtained through a random sampling procedure.
Static Model -> Model that describes a contemporaneaous relationship between y and x. A static model is used when a change in x at time t is believed to have an immediate effect on y.
Dynamic Model -> Model that describes a lagged relationship between y and x. A dynamic model is used when a change in x at time t is not believed to have an immediate effect on y.
Key assumptions for OLS to be unbiased when using time series data (unlikely to be valid for many models):
1. Parameter are linear
2. No perfect collinearity
3. Zero conditional mean. Error has to be uncorrelated to x.
4. Homoskedasticity of errors
5. No serial correlation of errors
6. Errors follow a normal distribution
Key assumptions for asymptotic OLS
Consistent OLS:
- data is stationary
- data is weakly-dependent
- the x-variables are contemporaneously uncorrelated with errors
Time Trend
A time trend occurs when two or more sequences are trending in the same or opposite direction, which could lead to the false conclusion that changes in one variable are caused by changes in the other variable.
Stationarity
- A process is stationary, if the moments of the series (mean, variance, etc) are independent of time.
Stationarity is when the outcomes are random. If they are not random, the process is non-stationary. Regressions that are non-stationary are called “spurious” and may have no meaning. - shocks to stationary time series are temporary
Covariance Stationary
- weakly dependent time series are only covariance stationary
- As h -> infinity, the covariance goes towards zero and the variables are asymptotically uncorrelated
- replaces the random sampling assumption
Serial Correlation / Autocorrelation
If errors are correlated over time, they are said to suffer from serial correlation or autocorrelation. This is only a problem in time series data, as under cross-sectional data, the random sampling ensures uncorrelated errors.
Test for serial correlation:
1. Breusch-Godfrey
- tests for serial correlation in the presence of endogenous variables
- can also be used to test higher-order autoregressive processes
- H0: no serial correlation
- HA: serial correlation -> re-estimate the model with an additional lag
2. Durbin-Watson
- provides similar results as the Breusch-Godfrey test, but it cannot be used when multiple lags are included in the model.
Breusch-Godfrey is better than Durbin-Watson
Heteroskedasticity
Test and interpretation for heteroskedasticity is the same as for cross-sectional data.
Autoregressive Process
A first-order autoregressive AR(1) process is one where the current values of y depend on the past value of y-1. - If p<1 there is weak dependence, as the covariance will tend to zero. The series is stationary. - If p=1 the process has a unit root and y is said to follow a random walk. The data is non-stationary. An AR(p) process can be stationary if the stability condition holds. Stability condition: The sum of the p autoregressive coefficients is less than 1.
Partial Autocorrelation
Partial autocorrelation functions are a way of diagnosing which type of AR(p) process one is looking at.
In an AR(p) process the first p lags of the partial autocorrelation function will be significantly different from zero.
Non-stationarity
A process that has a time trend or some other trend.
In case of a time trend -> include a time trend factor to control for the trend
Non-stationarity is not due to time but due to the way variables change over time.
Unit Root
A unit root is a stochastic trend in a time series, sometimes called a “random walk with drift”; If a time series has a unit root, it shows a systematic pattern that is unpredictable.
Test for Unit roots:
1. Dickey-Fuller Test (DF)
- applied to residuals
- works only for AR(1) processes
H0: series has a unit root
HA: series is stationary
2. Augmented Dickey-Fuller test (ADF)
- identical to DF but also works for higher-order autoregressive processes
- when null-hypothesis cannot be rejected, additional lags need to be added until the null can be rejected and the data is stationary
- necessary to specify the number of unit roots
3. Phillips-Perron test (PP)
H0: p=1 -> if failed to rejectthere is a unit root
HA: p<1 -> series is stationary
Advantage compared to ADF:
- not necessary to specify the number of lags
Disadvantage compared to ADF:
- ADF performs better in finite samples
4. KPSS test
opposite of the other tests
H0: series is stationary
HA: there is a unit root
- if there still is serial correlation in the error term, most likely more lags need to be added.
- variables with the same orders of integration cannot simply be regressed against one another. It is necessary to check for cointegration.
- The power of unit root tests depends much more on the span of the data, ceteris paribus, than on the number of observations. But longer spans bear greater risk of structural breaks.
- A series can be trending, but not highly persistent and vice versa. It is highly persistent when p=1
Random Walk
A process that is integrated of order 1 is a random walk.
Order of integration
The order of integration refers to the number of unit roots (d) needed so that a process has a stationary representation. d is the order of integration.
If two series are cointegrated, where the linear combination of two I(1) series is itself I(0).
Cointegration
- Two non-stationary series can only be regressed against one another if they are cointegrated.
- If two variables are cointegrated, they move together, which implies that over the long-run they will not move too far apart.
- Cointegration allows capturing the equilibrium relationship between non-stationary series within a stationary model.
Test for cointegration:
1. Engle-Granger test (single equation test) - essentially the same as DF or ADF
H0: no cointegration
HA: there is cointegration
2. Johansen Procedure (multivariate approach) - tests for the number of cointegrating relationships
- less restrictive than Engle Granger Method
H0: no cointegration
HA: there is cointegrating relationship(s)
With x variables, there are x-1 possible cointegrating relationships. If there were x cointegrating relationships, the data were stationary.
Autoregressive distribute lags model (ARDL)
- A model where lagged values of the dependent variable appear as explanatory variables (the “autoregressive” part) and the other explanatory variables all have several lags (the “distributed” lag part).
- The model is specified by ARDL(A,B) where A specifies the number of lags for the dependent variable and B specifies the number of lags for the independent variable. If there is more than one independent variable, it becomes ARDL(A,B,…).
- how many lags need to be included has to be specified by the number of cointegrating relationships. The number of lags does not have to be equal for all variables.
- The problem with ARDL is that the presence of lagged dependent variables invalidates many tests, due to the autocorrelation.
Error Correction Model (ECM)
- A model that incorporates a mechanism which restores a variable to its long-term relationship from a disequilibrium position. The variables included in an ECM must be cointegrated. The model allows for short-run and long-run dynamics.
- The error correction coefficient (λ) is the speed of adjustment. It provides an indication of how fast the system returns to the long-run equilibrium following a shock or perturbation. When λ is relatively large (close to 1), the adjustment mechanism works relatively quickly; when it is relatively small (close to 0), the adjustment works slowly.
- Using the Engle-Granger Two Step Method one can estimate ß if it is unknown.
Vector Autoregressive Model (VAR)
- Basic multivariate time series approach
-In VARs we model several series in terms of their past. This differs from AR models, as in AR only a single series is modelled in terms of its past values. A VAR is an n-equation, n-variable linear model in which each variable is in turn explained by its own lagged values, plus current and past values of the remaining n-1 variables. - VARs are powerful in data description and forecasting, but do not solve the problem of correlation/causation.
Reduced form VAR
-> each variable is expressed as a linear function of its own past values, the past values of all other variables being considered and a serially uncorrelated error term
-> Each equation is then estimated using OLS
-> if the different variables included in the VAR are correlated with each other, then the error terms of the different equations will also be correlated
-> includes the minimum number of parameters possible
Recursive VAR
-> error terms in each regression are uncorrelated with the error in the preceding equation
-> includes some contemporaneous values as a regressor
-> the results depend on the order of the variables: changing the order changes the VAR equations, coefficients and residuals
Structural VAR
-> requieres “identifying assumptions” that allow correlations to be interpreted causally
-> produces instrumental variables that permit the contemporaneous links to be estimated
-> possible to convert the structural VAR to a reduced VAR by multiplying the structural VAR