Time Series Flashcards
Define time-series data.
Time series data is observations of the same variable at different times.
Name three types of time-series variables and give examples of each.
Types of time-series variables:
- Stocks and flows
e. g. GDP, investment, exports, wages - Prices and rates
e. g. oil price, interest rates, exchange rates - Indices
e. g. price index, Consumer Price Index (CPI)
Define what indices are and explain how they are useful for economists.
An index measures the average change in something relative to a base period.
Indices can transform nominal values into real values which is useful since economic behaviour is usually influenced by real, not nominal, variables.
Define what a lag is and why these are useful for economists.
A (time) lag refers to the value of a variable in previous time periods.
First lag of Yt is Yt-1
Second lag of Yt is Yt-2
Lags are useful because there is often a delay between an economic action and a consequence.
Define differencing.
Define the first difference of a variable in mathematical notation.
Define the first difference of a variable which has been transformed into logarithms.
Differencing calculates the change in a variable between two successive time periods, in other words the period-to-period change.
First difference refers to the change in value of Yt between (t-1) and t.
∆Yt = Yt - Y(t-1)
First difference of a variable which has been transformed into logarithms:
∆lnYt = ln(Yt/Yt-1)
How can we find an approximation of the growth rate of a variable?
The growth rate of a variable is roughly equal to the change in the logarithm of that variable.
% change in Yt = (roughly) 100 x ∆lnYt
Name three potential problems with time-series data
Three potential problems with time-series data:
- Autocorrelation/Serial correlation
- Volatility clustering
- Non-stationarity (trends/breaks)
Define volatility clustering and what type of data is often affected by this.
Volatility clustering is where a series is characterised by periods of high volatility followed by periods of low volatility.
This is often relevant to financial data.
Define breaks in time-series data.
What may cause breaks to occur?
Why are breaks important to take into considering regarding forecasting models?
Breaks in time-series data refer to when a pattern in data ceases to occur either abruptly or slowly.
This may be due to structural or policy changes in the economy.
Breaks are important in forecasting models since data before the break will not be useful in forecasting future values of the variable after the break.
Define autocorrelation/serial correlation.
What type of autocorrelation is most likely to characterise time-series data?
Define this type of autocorrelation
Autocorrelation/serial correlation refers to when a series is correlated with its own lag, in other words the error terms are correlated.
Alternatively, autocorrelation is where a variable is correlated with itself over various time intervals.
Positive autocorrelation is likely to characterise time-series data.
Positive autocorrelation implies that if a variable is high in one period, it is likely to be high in the next period. Equally, if the variable is low in one period, it is likely to be low in the next period.
Write out formally no autocorrelation.
No autocorrelation: corr(ut, us) = 0
for all t /= s
What are the consequences of autocorrelation?
If a time-series is autocorrelated then:
- OLS estimates are no longer efficient with minimum variance, which means they are no longer BLUE.
- OLS standard errors are UNDERestimated, which means that t-values are OVER-estimated, confidence intervals are narrower. This means we are more likely to make TYPE 1 ERRORS where we incorrectly reject a true null hypothesis.
What are two ways that we can try to detect autocorrelation graphically?
How can we do these in Stata?
Two ways to detect autocorrelation graphically:
1. Plot a chronological line graph of the regression residuals, et. If there are long sequences of negative residuals following each other, or positive residuals following each other, they are likely to be autocorrelated.
In Stata:
regress y x
predict res, residuals
line res year
- Plot the residuals against their lagged values (i.e. residual value on y-axis, lagged residual value on x-axis) and include a line of best fit.
If the line of best fit is upward sloping, this implies positive autocorrelation.
In Stata:
regress y x
predict res, residuals
twoway (scatter res L.res) (lfit res L.res).
How can we formally test for autocorrelation?
How can we allow for violation of the strict exogeneity assumption?
How can we perform this test in Stata?
The simplest way to test for autocorrelation is to regress the residuals on their lagged values (et on et-1).
The null hypothesis is no autocorrelation (coefficient on the lag equals zero).
We then look at the p-value/t-static for the coefficient on L1 to determine whether to reject the null or not. (e.g. if p<0.01 then we reject the null at all conventional significance levels, since 1%, 5% and 10% are all greater than the p value).
H0: lag coefficients are equal to zero (no auto-correlation)
H1: lag coefficients are non-zero (autocorrelation)
To formally test for autocorrelation we use the Durban alternative test for autocorrelation. This regresses the regression residuals on their lags and can include additional lags and allow for violation of strict exogeneity by including the regressor as well as the outcome variable.
In Stata:
regress y x
estat durbinalt
Why can we not use the error terms, ut, to test for autocorrelation? What can we use instead?
We cannot use the error terms, ut, to test for autocorrelation because we never observe these.
Instead, we can use their proxies, the residuals et, after estimating the regression model.
The regression residuals are consistent estimates of the error terms - as the sample size increases, the values of the residuals converge to the true values of the error terms.
What does it mean if an estimate is ‘consistent’?
A consistent estimate is one which converges to the true value of the parameter as the sample size increases.
Define strict exogeneity.
Is this likely to be an issue in cross-sectional data?
Is this likely to be an issue in time-series data?
What is the consequence of violation of the strict exogeneity assumption?
Strict exogeneity is where the error terms associated with the outcome variable, ut, are uncorrelated with the explanatory variables, Xts (past and future).
With time-series data, this implies that the explanatory variable, Xt, does not react to past values of Y.
This is unlikely to be violated in cross-sectional data because it is unlikely that different observations within a random cross-sectional sample will have similar background characteristics.
This is likely to be violated in time-series data because we are often concerned with policy variables which are impacted by what has happened in the past.
For example, welfare expenditure, highway speed limits, labour input.
If strict exogeneity is violated then estimates will be biased.
What does it mean if an estimate is efficient?
An efficient estimate implies that it has minimum variance.
What is a solution for autocorrelation?
How can this be implemented in Stata?
A solution for autocorrelation is to create Heteroscedasticity and Autocorrelation Consistent (HAC or Newey-West) standard errors.
These take into account both heteroscedasticity (unequal variance) and autocorrelation (correlation between a variable and itself over time).
HAC standard errors can be computed in Stata by substituting regress with newey.
What is the difference between exogeneity and autocorrelation?
Exogeneity refers to the relationship between the Xts and the Yts - in other words between the explanatory and outcome variables.
Exogeneity implies that the error terms associated with Yt are not correlated with the explanatory variables.
Autocorrelation refers to the relationship between the Yt and itself over time.
Autocorrelation implies that there is a correlation between the error terms over time.
Define heteroscedasticity.
Is this likely to be a problem in time-series data?
Heteroscedasticity is where the variance of the error term associated with the outcome variable is not constant over time.
Heteroscedasticity is less likely to be an issue than autocorrelation in time-series data.
What are the consequences of heteroscedasticity?
Heteroscedasticity leads to invalid OLS standard errors which invalidates hypothesis testing and t-statistics.
What are two potential solutions to heteroscedasticity?
Two potential solutions to heteroscedasticity:
- Use HAC standard errors
- Build a model of the error terms
What is conditional heteroscedasticity?
In what circumstances will conditional heteroscedasticity present in time-series data?
Conditional heteroscedasticity is where the VARIANCE of the error terms is autocorrelated - when variance/volatility is high in one period, it is high in the next.
Conditional heteroscedasticity arises when data is characterised by volatility clustering.