Topics 24-30 Flashcards
Mean Squared Error and Model Selection
Mean squared error (MSE) is a statistical measure computed as the sum of squared residuals divided by the total number of observations in the sample.
The MSE is based on in-sample data. The regression model with the smallest MSE is also the model with the smallest sum of squared residuals.
MSE is closely related to the coefficient of determination (R2). Notice in the R2 equation that the numerator is simply the sum of squared residuals (SSR), which is identical to the MSE numerator.
Model selection is one of the most important criteria in forecasting data. Unfortunately, selecting the best model based on the highest R2 or smallest MSE is not effective in producing good out-of-sample forecasting models. A better methodology to select the best forecasting model is to find the model with the smallest out-of-sample, one-step-ahead MSE.
The s2 Measure, adjusted R2
Akaike information criterion (AIC) and the Schwarz information criterion (SIC)
Explain the necessary conditions for a model selection criterion to demonstrate consistency
Consistency is a key property that is used to compare different selection criteria.
Two conditions are required for a model selection criteria to be considered consistent based on whether the true model is included among the regression models being considered.
- When the true model or data generating process (DGP) is one of the defined regression models, then the probability of selecting the true model approaches one as the sample size increases.
- When the true model is not one of the defined regression models being considered, then the probability of selecting the best approximation model approaches one as the sample size increases.
The reality is that the second condition of consistency is more relevant. All of our models are most likely false so, therefore, we are seeking the best approximation.
The most consistent selection criteria with the greatest penalty factor for degrees of freedom is the SIC.
While the SIC is considered the most consistent criteria, the AIC is still a useful measure. If we consider the fact that the true model may be much more complicated than the models under consideration, then the AIC measure should be examined. Asymptotic efficiency is the property that chooses a regression model with one-step-ahead forecast error variances closest to the variance of the true model. Interestingly, the AIC is asymptotically efficient and the SIC is not asymptotically efficient.
Note that SIC is consistent only if true model or it’s best approximation is in the set of models being evaluated. This is rarely the case since the true DGP or any of it’s approximations are much more complicated than any of the models that we can fit (and handle). We include another desirable property asymptotic efficiency.
An asymptotically efficient model selection criterion chooses a sequence of models (as sample size grows) whose 1-step-ahead forecast error variance approaches that of the true model (assuming it is known) at a rate that is at least as fast as any other model selection criterion. AIC although being inconsistent, is asymptotically efficient, while SIC is not.
Two approaches for modeling and forecasting a time series impacted by seasonality
There are two approaches for modeling and forecasting a time series impacted by seasonality:
- using a seasonally adjusted time series and
- regression analysis with seasonal dummy variables.
A seasonally adjusted time series is created by removing the seasonal variation from the data. This type of adjustment is commonly made in macroeconomic forecasting where the goal is to only measure the nonseasonal fluctuations of a variable. However, the use of seasonal adjustments in business forecasting is usually inappropriate because seasonality often accounts for large variations in a time series. Financial forecasters should be interested in capturing all variation in a time series, not just the nonseasonal portions.
Explain how to construct an h-step-ahead point forecast
Autoregression
Autoregression refers to the process of regressing a variable on lagged or past values of itself. As you will see in the next topic, when the dependent variable for a time series is regressed against one or more lagged values of itself, the resultant model is called as an autoregressive (AR) model. For example, the sales for a firm could be regressed against the sales for the firm in the previous month. Thus, in an autoregressive time series, past values of a variable are used to predict the current (and hence future) value of the variable.
Covariance stationary time series
A time series is covariance stationary if its mean, variance, and covariances with lagged and leading values do not change over time. Covariance stationarity is a requirement for using AR models.
Autocovariance function
Autocovariance function refers to the tool used to quantify stability of the covariance structure. Its importance lies in its ability to summarize cyclical dynamics in a series that is covariance stationary.
Autocorrelation function
Autocorrelation function refers to the degree of correlation and interdependency between data points in a time series. It recognizes the fact that correlations lend themselves to clearer interpretation than covariances.
Recall that the degree of correlation is measured on a continuum from — 1 to 1, whereas degrees of covariance employ a much wider range, which can be unwieldy in determining levels of association.
The Durbin-Watson statistic - what it is and how to use
The Durbin-Watson statistic falls between zero and four.
- Two indicates no autocorrelation; aka, no serial correlation.
- Zero is perfect positive autocorrelation, and four is perfect negative autocorrelation.
Partial autocorrelation function
Partial autocorrelation function refers to the partial correlation and interdependency between data in a time series that measures the association between data in a series after controlling for the effects of lagged observations
Requirements for a series to be covariance stationary
A time series is covariance stationary if it satisfies the following three conditions:
- Constant and finite expected value. The expected value of the time series is constant over time.
- Constant and finite variance. The time series volatility around its mean (i.e., the distribution of the individual observations around the mean) does not change over time.
- Constant and finite covariance between values at any given lag. The covariance of the time series with leading or lagged values of itself is constant.
Explain the implications of working with models that are not covariance stationary
Requirements for covariance stationarity of a time series, though strict in appearance, make allowances for many series that are not covariance stationary. This is achieved by working with models that provide special treatment to trend and seasonality components that are stationary, which allows the remaining, or residual, cyclical component to be covariance stationary.
A nonstationary series can be transformed to appear covariance stationary by using transformed data, such as growth rates.
White noise process
A time series process with a zero mean, constant variance, and no serial correlation is referred to as a white noise process (or zero-mean white noise). This is the simplest type of time series process and it is used as a fundamental building block for more complex time series processes. Even though a white noise process is serially uncorrelated, it may not be serially independent or normally distributed.
Variants of a white noise process include independent white noise and normal white noise. A time series process that exhibits both serial independence and a lack of serial correlation is referred to as independent white noise (or strong white noise) . A time series process that exhibits serial independence, is serially uncorrelated, and is normally distributed is referred to as normal white noise (or Gaussian white noise).
The dynamic structure of a white noise process includes the following characteristics:
- The unconditional mean and variance must be constant for any covariance stationary process.
- The lack of any correlation in white noise means that all autocovariances and autocorrelations are zero beyond displacement zero (displacement refers to the distance of a moving body from a central point). This same result holds for the partial autocorrelation function of white noise.
- Both conditional and unconditional means and variances are the same for an independent white noise process (i.e., they lack any forecastable dynamics).
- Events in a white noise process exhibit no correlation between the past and present.
Why understanding white noise is tremendously important (two reasons)?
Understanding white noise is tremendously important for at least two reasons.
- First, processes with much richer dynamics are built up by taking simple transformations of white noise.
- Second, 1-step-ahead forecast errors from good models should be white noise. After all, if such forecast errors aren’t white noise, then they’re serially correlated, which means that they’re forecastable, and if forecast errors are forecastable then the forecast can’t be very good.
Lag operator
Wold’s representation theorem
Wold’s representation theorem is a model for the covariance stationary residual (i.e., a model that is constructed after making provisions for trends and seasonal components). Thus, the theorem enables the selection of the correct model to evaluate the evolution of covariance stationarity. Wold’s representation utilizes an infinite number of distributed lags, where the one-step-ahead forecasted error terms are known as innovations.
The general linear process is a component in the creation of forecasting models in a covariance stationary time series. It uses Wold’s representation to express innovations that capture an evolving information set. These evolving information sets move the conditional mean over time (recall that a requirement of stationarity is a constant unconditional mean). Thus, it can model the dynamics of a times series process that is outside of covariance stationarity (i.e., unstable).
As mentioned, applying Wold’s representation requires an infinite number of distributed lags. However, it is not practical to model an infinite number of parameters. Therefore, we need to restate this lag model as infinite polynomials in the lag operator because infinite polynomials do not necessarily contain an infinite number of parameters. Infinite polynomials that are a ratio of finite-order polynomials are known as rational polynomials. The distributed lags constructed from these rational polynomials are known as rational distributed lags. With these lags, we can approximate Wold’s representation. Autoregressive moving average (ARMA) process is a practical approximation for Wold’s representation.
Calculate the sample mean and sample autocorrelation, and describe the Box-Pierce Q-statistic and the Ljung-Box Q-statistic
Testing for white noise formulae
What test can be used to check the hypothesis of no seasonality If the regression disturbances are white noise?
If the regression disturbances are white noise, the standard F-test can be used to test the hypothesis of no seasonality.
The hypothesis of no seasonality, in which case you could drop the seasonal dummies, corresponds to equal seasonal coefficients across seasons, which is a set of (s-1) linear restrictions. This is a standard F-test, but students need to be reminded that the tests’s legitimacy requires that the regression disturbances be white noise, which may well not hold in a regression on only trend and seasonals. Otherwise, the F statistic will not in general have the F distribution.
The Ljung-Box Q-statistic vs. the Box-Pierce Q-statistic
The Ljung-Box Q-statistic is effectively similar to the Box-Pierce Q-statistic, except it is meant for small samples.
A slight modification of the Box-Pierce Q-statistic, designed to follow more closely the chi-squared distribution in small samples, is the Ljung-Box Q-statistic. Under the null hypothesis that y is white noise, the Ljung-Box Q-statistic is approximately distributed as a chisquared random variable. Note that the Ljung-Box Q-statistic is the same as the Box-Pierce Q statistic, except that the sum of squared autocorrelations is replaced by a weighted sum of squared autocorrelations, where the weights are (T+2)/(T-τ). For moderate and large T, the weights are approximately 1, so that the Ljung-Box statistic differs little from the Box-Pierce statistic.
Describe the properties of the first-order moving average (MA(1)) process, and distinguish between autoregressive representation and moving average representation.
Moving Average: 1st Order (MA(1)) - basic properties
Describe the properties of a general finite-order process of order q (MA(q)) process.
Moving Average: q Order (MA(q)) - basic properties