Time Series Flashcards
Define time-series data.
Time series data is observations of the same variable at different times.
Name three types of time-series variables and give examples of each.
Types of time-series variables:
- Stocks and flows
e. g. GDP, investment, exports, wages - Prices and rates
e. g. oil price, interest rates, exchange rates - Indices
e. g. price index, Consumer Price Index (CPI)
Define what indices are and explain how they are useful for economists.
An index measures the average change in something relative to a base period.
Indices can transform nominal values into real values which is useful since economic behaviour is usually influenced by real, not nominal, variables.
Define what a lag is and why these are useful for economists.
A (time) lag refers to the value of a variable in previous time periods.
First lag of Yt is Yt-1
Second lag of Yt is Yt-2
Lags are useful because there is often a delay between an economic action and a consequence.
Define differencing.
Define the first difference of a variable in mathematical notation.
Define the first difference of a variable which has been transformed into logarithms.
Differencing calculates the change in a variable between two successive time periods, in other words the period-to-period change.
First difference refers to the change in value of Yt between (t-1) and t.
∆Yt = Yt - Y(t-1)
First difference of a variable which has been transformed into logarithms:
∆lnYt = ln(Yt/Yt-1)
How can we find an approximation of the growth rate of a variable?
The growth rate of a variable is roughly equal to the change in the logarithm of that variable.
% change in Yt = (roughly) 100 x ∆lnYt
Name three potential problems with time-series data
Three potential problems with time-series data:
- Autocorrelation/Serial correlation
- Volatility clustering
- Non-stationarity (trends/breaks)
Define volatility clustering and what type of data is often affected by this.
Volatility clustering is where a series is characterised by periods of high volatility followed by periods of low volatility.
This is often relevant to financial data.
Define breaks in time-series data.
What may cause breaks to occur?
Why are breaks important to take into considering regarding forecasting models?
Breaks in time-series data refer to when a pattern in data ceases to occur either abruptly or slowly.
This may be due to structural or policy changes in the economy.
Breaks are important in forecasting models since data before the break will not be useful in forecasting future values of the variable after the break.
Define autocorrelation/serial correlation.
What type of autocorrelation is most likely to characterise time-series data?
Define this type of autocorrelation
Autocorrelation/serial correlation refers to when a series is correlated with its own lag, in other words the error terms are correlated.
Alternatively, autocorrelation is where a variable is correlated with itself over various time intervals.
Positive autocorrelation is likely to characterise time-series data.
Positive autocorrelation implies that if a variable is high in one period, it is likely to be high in the next period. Equally, if the variable is low in one period, it is likely to be low in the next period.
Write out formally no autocorrelation.
No autocorrelation: corr(ut, us) = 0
for all t /= s
What are the consequences of autocorrelation?
If a time-series is autocorrelated then:
- OLS estimates are no longer efficient with minimum variance, which means they are no longer BLUE.
- OLS standard errors are UNDERestimated, which means that t-values are OVER-estimated, confidence intervals are narrower. This means we are more likely to make TYPE 1 ERRORS where we incorrectly reject a true null hypothesis.
What are two ways that we can try to detect autocorrelation graphically?
How can we do these in Stata?
Two ways to detect autocorrelation graphically:
1. Plot a chronological line graph of the regression residuals, et. If there are long sequences of negative residuals following each other, or positive residuals following each other, they are likely to be autocorrelated.
In Stata:
regress y x
predict res, residuals
line res year
- Plot the residuals against their lagged values (i.e. residual value on y-axis, lagged residual value on x-axis) and include a line of best fit.
If the line of best fit is upward sloping, this implies positive autocorrelation.
In Stata:
regress y x
predict res, residuals
twoway (scatter res L.res) (lfit res L.res).
How can we formally test for autocorrelation?
How can we allow for violation of the strict exogeneity assumption?
How can we perform this test in Stata?
The simplest way to test for autocorrelation is to regress the residuals on their lagged values (et on et-1).
The null hypothesis is no autocorrelation (coefficient on the lag equals zero).
We then look at the p-value/t-static for the coefficient on L1 to determine whether to reject the null or not. (e.g. if p<0.01 then we reject the null at all conventional significance levels, since 1%, 5% and 10% are all greater than the p value).
H0: lag coefficients are equal to zero (no auto-correlation)
H1: lag coefficients are non-zero (autocorrelation)
To formally test for autocorrelation we use the Durban alternative test for autocorrelation. This regresses the regression residuals on their lags and can include additional lags and allow for violation of strict exogeneity by including the regressor as well as the outcome variable.
In Stata:
regress y x
estat durbinalt
Why can we not use the error terms, ut, to test for autocorrelation? What can we use instead?
We cannot use the error terms, ut, to test for autocorrelation because we never observe these.
Instead, we can use their proxies, the residuals et, after estimating the regression model.
The regression residuals are consistent estimates of the error terms - as the sample size increases, the values of the residuals converge to the true values of the error terms.
What does it mean if an estimate is ‘consistent’?
A consistent estimate is one which converges to the true value of the parameter as the sample size increases.
Define strict exogeneity.
Is this likely to be an issue in cross-sectional data?
Is this likely to be an issue in time-series data?
What is the consequence of violation of the strict exogeneity assumption?
Strict exogeneity is where the error terms associated with the outcome variable, ut, are uncorrelated with the explanatory variables, Xts (past and future).
With time-series data, this implies that the explanatory variable, Xt, does not react to past values of Y.
This is unlikely to be violated in cross-sectional data because it is unlikely that different observations within a random cross-sectional sample will have similar background characteristics.
This is likely to be violated in time-series data because we are often concerned with policy variables which are impacted by what has happened in the past.
For example, welfare expenditure, highway speed limits, labour input.
If strict exogeneity is violated then estimates will be biased.
What does it mean if an estimate is efficient?
An efficient estimate implies that it has minimum variance.
What is a solution for autocorrelation?
How can this be implemented in Stata?
A solution for autocorrelation is to create Heteroscedasticity and Autocorrelation Consistent (HAC or Newey-West) standard errors.
These take into account both heteroscedasticity (unequal variance) and autocorrelation (correlation between a variable and itself over time).
HAC standard errors can be computed in Stata by substituting regress with newey.
What is the difference between exogeneity and autocorrelation?
Exogeneity refers to the relationship between the Xts and the Yts - in other words between the explanatory and outcome variables.
Exogeneity implies that the error terms associated with Yt are not correlated with the explanatory variables.
Autocorrelation refers to the relationship between the Yt and itself over time.
Autocorrelation implies that there is a correlation between the error terms over time.
Define heteroscedasticity.
Is this likely to be a problem in time-series data?
Heteroscedasticity is where the variance of the error term associated with the outcome variable is not constant over time.
Heteroscedasticity is less likely to be an issue than autocorrelation in time-series data.
What are the consequences of heteroscedasticity?
Heteroscedasticity leads to invalid OLS standard errors which invalidates hypothesis testing and t-statistics.
What are two potential solutions to heteroscedasticity?
Two potential solutions to heteroscedasticity:
- Use HAC standard errors
- Build a model of the error terms
What is conditional heteroscedasticity?
In what circumstances will conditional heteroscedasticity present in time-series data?
Conditional heteroscedasticity is where the VARIANCE of the error terms is autocorrelated - when variance/volatility is high in one period, it is high in the next.
Conditional heteroscedasticity arises when data is characterised by volatility clustering.
What are AR(p) and ADL(p,q) models?
AR(p) is an auto-regressive model which regresses a variable on its own lags which captures the persistence of a variable after an initial shock has occurred.
‘p’ indicates the number of lags of the variable itself included in the model.
ADL(p, q) is an auto-regressive distributed lag model. This regresses a dependent variable on its own lags, plus the lags of an additional regressor.
‘p’ indicates the number of lags of the outcome variable, Y
‘q’ indicates the number of lags of the additional regressor, X.
ADL models can be extended to include multiple regressors.
Why are a new set of assumptions needed for the Least Squares method for ADL models?
A new set of Least Squares assumptions are needed for ADL models because strict exogeneity is very unlikely to hold with time-series data.
What are the four key Least Squares assumptions for ADL models?
Four key Least Squares assumptions for ADL models:
1. Conditional mean assumption - the conditional mean of the error terms is zero for all the lagged values of the regressors included in the model (Ys and Xs), but importantly NOT on their PRESENT value.
E(ut | Yt-1, Yt-2, …, Xt-1, Xt-2,…) = 0
- Stationarity of all random variables - this implies that the variables in the series are not a function of time, and the statistical properties such as mean, variance, autocorrelation etc. are all constant over time.
Alternatively, stationarity means that the probability distribution of the variable, Y, is constant over time. - No large outliers
- No multicollinearity
What does the conditional mean assumption of the Least Squares method for ADL models ensure?
The conditional mean assumption ensures that the error terms are not autocorrelated.
In addition, the conditional mean assumption ensures that the model is well-specified and its forecasting power cannot be improved by including more lags of any of the regressors.