Chapter 6: Statistical Models for Time Series Flashcards by Thomas P O'Connor

How do linear models for time series differ from models applied to cross-sectional data?

They account for the correlations that arises between data points in the same time series, in contrast to the standard methods applied to cross-sectional data, in which point is assumed to be independent.

How well did you know this?

Not at all

Perfectly

What assumptions would allow ordinary least squares to be applied to time series data?

With respect to time series behaviour:
- Time series has a linear response to its predictors.
- No input variable is constant over time or perfectly correlated with another input variable.

With respect to error:
- For each point in time, the expected value of the error, given all explanatory variables for all time periods is 0.
- The error at any given time period is uncorrelated with the inputs at any time period in the past or future.
- Variance of the error is independent of time.

How well did you know this?

Not at all

Perfectly

What is an autoregressive model (AR)?

AR models rely on the intuition that the past predicts the future and so posits a time series process in which the value at a point in time t is a function of the series’s values at earlier points in time.

How well did you know this?

Not at all

Perfectly

What is the equation for the simplest AR model, AR(1)?

y(t) = b0 + b1 * y(t-1) + e(t)

How well did you know this?

Not at all

Perfectly

Does a time series need to be stationary to be modelled using AR?

Yes, the time series does need to be stationary.

How well did you know this?

Not at all

Perfectly

What is the difference between strong and weak stationarity?

Weak stationarity only requires the mean and the variance of a process to be invariant.

Strong stationarity requires the distribution of the random variables output by a process to remain over time. It demands the statistical distribution of y1, y2, y3, y4 to be the same as y101, y102, y103.

How well did you know this?

Not at all

Perfectly

What is the definition of a distribution?

A distribution is a statistical function describing the probabilities for all possible values that a particular value will be generated by a process.

How well did you know this?

Not at all

Perfectly

What is Akaike Information Criterion (AIC) for a model?

AIC = 2k - 2lnL, where k is the number of parameters and L is the maximum likelihood value for a that function. In general we want to lessen the complexity of the model (lessen k) but increase the goodness of fit (i.e L), so we favour models with lower AIC’s than higher.

How well did you know this?

Not at all

Perfectly

What is a likelihood function?

A likelihood function is a measure of how likely a particular set of parameters for a function is in relation to the other parameters for that function given the data.

For example, when fitting a linear model to y = [1,2,3], x = [1,2,3] using y = b * x, your likelihood function would tell you that an estimate of b = 1 was far more likely than an estimate b = 0.

How well did you know this?

Not at all

Perfectly

REFACTOR THIS CARD
What check should we do after fitting an AR(p) model to assess goodness of fit with respect to the models errors?

Plot an ACF of the residuals (error) at each lag to see if any cross the significance threshold.

How well did you know this?

Not at all

Perfectly

If we see significant autocorrelation between the errors in a model, what should we do?

Return to the model and consider additional terms to add complexity to account for the significant autocorrelation of the residuals.

How well did you know this?

Not at all

Perfectly

What is the Lijung-Box test?

The Lijung-Box test is an overall test of the randomness of a time series. It poses the following:
- H0: The data does not exhibit serial correlation.
- H1: The data does exhibit serial correlation.

How well did you know this?

Not at all

Perfectly

When is the Lijung-Box test applied?

The test is commonly applied to AR and ARIMA models, more specifically to their errors rather than the models themselves.

How well did you know this?

Not at all

Perfectly

When using an AR model to predict, how should we assess the performance of the predictions?

We should like a the correlation between the raw time series and predictions, but more importantly the correlation between the difference time series and predicted time series.

How well did you know this?

Not at all

Perfectly

What are Moving Average (MA) models?

A moving average model is similar to an autoregressive model except that the terms included in the linear equation refer to present and past error terms rather than present and past values of the process itself.

How well did you know this?

Not at all

Perfectly

How do you determine the parameters to use for a MA model?

Use an ACF to determine the order of the MA process.

How would you use the output of an ACF in a MA model?

Where there are significant values at lags, fit an MA model with these lags.

What does ARIMA stand for and what does it model?

AutoRegressive Integrated Moving Average. This combines AR and MA models and accounts for differencing.

How do you determine which model out of AR, MA or ARMA best describe the time series?

ACF behaviour:
AR(p) - ACF falls off slowly.
MA(q) - Sharp drop after lag = q.
ARMA - No sharp cut off.

PACF behaviour:
AR(p) - Sharp drop off after lag = p.
MA(q) - Falls off slowly.
ARMA - No sharp cut off.

What is Wolds theorem?

Wolds theorem tells us that every covariance-stationary time series can be written as the sum of two times, one deterministic and one stochastic.

What are reasonable values of p, d, in the context of ARIMA(p, d, q)?

A practitioner notes that one should be skeptical of:
d > 2
p > 5
q > 5
p&raquo_space; q | p &laquo_space;q

What is ARIMA(0, 0, 0)?

ARIMA(0, 0, 0) is a white noise model.

What is ARIMA(0, 1, 0)?

ARIMA(0, 1, 0) is a random walk.
Adding a non-zero constant to this is called a random walk with drift.

What is ARIMA(0, 1, 1)?

This is an exponential smoothing model. ARIMA(0, 2, 2) is the same as Holts linear method, which is exponential smoothing for data with an underlying trend.

What is the Box-Jenkins heuristic for choosing parameters for an ARIMA model?

1. Use your data, visualizations, and underlying knowledge to select a class of model appropriate to your data. 2. Estimate the parameters given your training data. 3. Evaluate the performance of your model based on your training data and tweak the parameters of the model to address the weaknesses you see in the performance diagnostics.

When inspecting the PACF of residuals for an ARIMA model, what do large values suggest?

Large PACF values for residuals suggest that our model does not fully describe the auto-regressive behaviour of the time series. If this is the case, we add a higher order AR component and re-run the model.

What is a quick way to check how well forecasts do?

A correlation between the forecasted values and the observed values.

What is an alternative to manually fitting models to determine their parameters?

Using automated model selection based off information loss criteria such as AIC.

What is Vector Autoregression (VA)?

Vector Autoregression generalises an AR(p) model to account for multiple times to capture how they relate to each other, i.e. can one predict the other.

What is seasonal ARIMA (SARIMA)?

A SARIMA model assumes multiplicative seasonality and postulates that the seasonal behaviour itself can be thought of as an ARIMA process.

What are ARCH, GARCH family of models?

Autoregressive Conditional Heteroskedasticity (ARCH) models used for times that don't have constant variance (i.e. stock prices) and that the variance appears autoregressive conditional on the earlier variances (for example high volatility days on the stock exchange occur in clusters). In these models, the variance of a process is modelled rather than the process itself.

What are the advantages of statistical models of time series?

- These models are simple and transparent, so they can be understood clearly in terms of their parameters. - Because of the simple mathematical expressions that define these models, it is possible to derive their properties of interest in a rigorous statistical way. - You can apply these models to fairly small data sets and still get good results. - These simple models and related modifications perform extremely well, even in comparison to very complicated machine learning models. So you get good performance without the danger of overfitting. - Well-developed automated methodologies for choosing orders of your models and estimating their parameters make it simple to generate these forecasts.

What are the disadvantages of statistical models of time series?

- Because these models are quite simple, they don’t always improve performance when given large data sets. If you are working with extremely large data sets, you may do better with the complex models of machine learning and neural network methodologies. - These statistical models put the focus on point estimates of the mean value of a distribution rather than on the distribution. True, you can derive sample variances and the like as some proxy for uncertainty in your forecasts, but your fundamental model offers only limited ways to express uncertainty relative to all the choices you make in selecting a model. - By definition, these models are not built to handle nonlinear dynamics and will do a poor job describing data where nonlinear relationships are dominant.