Lecture 18 Flashcards

Question 1

Q

Forecast is not a prediction

Answer

A

Forecasting is out of sample - you’re using data up to time T to predict future values like Yt+1, etc
- Y^t+h|T, the forecast of Y at time T + h, using data up to time T
- Y^t+h - y^t+h|T = forecast error
- forecasts can be one-step ahead, or multi-steps ahead

Forecasting is about using past data to -predict the future

Question 2

Q

MSFE - Mean Squared Forecast Error

Answer

A

Measures the average squared difference between the actual future value and your forecast
- MSFE = E[(Yt+1 - Y^(t+1|T))^2]
- squaring errors penalises big mistakes more than small ones

Decomposition shows two sources of forecast error:
1. Oracle error - due to unpredictable shocks ut+1
2. Estimation error, as we estimated coefficients, and they deviate from true ones

Question 3

Q

RMSFE - Root MSFE

Answer

A

just the square root of MSFE - interpreted like a typical forecast error, but hard to get as we dont know future values, YT+1, etc
if we assume stationarity and no estimation error, we can approximate it with:
RMSFEser = ROOT((SSR)/(T - n - 1))
if data is stationary, forecast errors have mean zero, and RMSFE can be estimated using the regression’s residual variance, ignores estimation error, but if sample size large relative to predictors, that’s often okay

Question 4

Q

FPE - Final Prediction Error

Answer

A

Adjust RMSFE to include estimation error, applies only if data is stationary and homoskedstic
- RMSFE^fpe = ROOT((SSR/T).((T + n - 1)/(T - n - 1)))
- T is the number of observations, n is number of predictors
- previous SER version understates forecast error by ignoring estimation uncertainty, this tries to fix that but still relies on strong assumptions, like errors being homoskedasticity, model being stationary

Question 5

Q

POOS - Pseudo Out-of-Sample explanation

Answer

A

doesn’t require strong assumptions and captures both estimation and forecast error
most honest/ realistic forecast evaluation

Avoids unrealistic assumptions of SER and FPE, mimics real forecasting conditions, so at each date, you only use information that would have been available then, and by re-estimating the model each time, it naturally incorporates estimation error

Question 6

Q

How POOS works

Answer

A

Split the sample - use the first 90% of data for model estimation, final 10% for forecasting
Re-estimate your model each time - for each date s, fit the model using data up to s
Forecast one step ahead of - Predict Ys+1 using the model fit through s, get Y^(s+1|s)
Compute forecast error - Ys+1 - Y^(s+1|s)
Compute the POOS RMSFE : ROOT((1/P).SUM(forecast error squared))

Question 7

Q

Forecast intervals
- key points

Answer

A

If the forecast error is normally distributed, we can build a 95% forecast interval around our prediction:
- y^t+1|t +- 1.96RMSFE^

This is NOT a CI, here YT+1 is a future random variable, so we’re capturing outcome uncertainty not parameter uncertainty
Strictly valid only if uT+1 is normal, but in practise approximation often works reasonably well too

Question 8

Q

Forecast intervals for transformations
- e.g. TRI(ln(lPt)) as the dependent variable

Often in time series, we model transformations rather than raw levels, but what if we want the forecast in levels rather than the growth rates

Answer

A

Forecast the change in logs, to get percentage change
Convert this forecast back to levels
Use RMSFE to build a forecast interval for the transformation regression, then convert bounds back into levels mode using step 2

To convert:
IPt^ + 1 = IPt.(1+change(IPt^)+1), but if % changes are not normal, then correct for the variance

Question 9

Q

Forecasting oil prices using time series methods

Answer

A

Model selection so AR, ADL, etc, use tools like BIC or AIC to pick lag length and variables
Checking for breaks - chow if you know roughly where, QLR to detect them
Point forecast - forecast log change, to %, then back to actual level
Forecast levels - use RMSFE, assuming small changes and normality
Choosing the right RMSFE - SER, FPE, POOS

Question 10

Q

Prediction in a big data or high-dimensional setting

Answer

A

in traditional regressions, we usually have fewer predictors than observations k < N
in big data we may have many predictors, some it’s even more predictors than data points which can make OLS unreliable due to overfit.

Question 11

Q

Whats an estimation sample and what’s a holdout sample
- formalised predictive regression setup - MLR

Answer

A

Estimation Sample: data used to estimate/fit your model
Holdout Sample: used for out of sample evaluation, crucial for forecasting, as in-sample fit doesn’t tell us about predictive power

assume that holdout sample comes from same distribution as estimation sample, otherwise OOS performance isn’t meaningful
- can get MSPE

Question 12

Q

MSPE - Mean Squared Prediction Error

Answer

A

MSPEols = (1 + (k/N).o^2) under homoskedasticity
- more predictors you use, k, worse your out of sample prediction can be, unless you have lots of data, N

Question 13

Q

How to estimate the MSPE using cross validation
- m-fold cross validation

Answer

A

Simulation of an out-of-sample testing environment
1. Split data into m chunks
2. For each chunk, estimate the model on (1-1/m).N observations, then predict the remaining N/m observations
3. Rotate through all m folds so each observation is used once for testing
4. Compute prediction errors for all test predictions, average them to get your MSPE

Question 14

Q

Why not use SER for model fit?

Answer

A

SER is in sample, only tells you how well the model fits the data it was trained on
MSPE is out sample, tells you how well the model predicts new data

Question 15

Q

How is m fold cross validation like POOS?

Answer

A

you pretend to be in real time, re estimating and predicting forward
cross validation does the same, but instead of a time sequence, splits data randomly or sequentially depending on context.

Question 16

Q

Ridge regression

Answer

Study These Flashcards

A

A regularisation technique used when you have many predictors, maybe more than observations, but at least enough that OLS starts to break down
- Ridge: OLS + k.SUM(bj)^2
- second term is the penalty, proportional to the sum of squared coefficients.
- if k=0, SSR, if K is large, coefficients shrink to 0, k is chosen as the value which minimises MSPE via cross-validation
- discourages large coefficients
- low k behaves like OLS, high, there is strong shrinkage and coefficients pulled to 0

Question 17

Q

How to choose k for ridge

Answer

Study These Flashcards

A

Pick a range of values of k to try
Use K-fold cross validation, split into K folds, use K-1 folds to estimate model, use the held-out fold to predict and compute MSPE for each
Average the MSPEs across the 5 folds
Choose the k with the lowest average MSPE across folds, optimal shrinkage level

Question 18

Q

Lasso regression

Answer

Study These Flashcards

A

First term is the same, but second term is k.SUM(|bj|)
- forces small coefficients to 0, creating sparsity
- use when you have lots of predictors and you think many may be irrelevant
- with ridge, all coefficients are shrunk smoothly towards 0, but never exact
- with lasso, due to penalty, optimisation geometry means some coefficients are forced to be exactly 0

Question 19

Q

Principal components regression

Answer

Study These Flashcards

A

Instead of selecting variables, PCR transforms them:

takes linear combinations of the original variables - called PCs
combinations chosen to maximise variance, capture as much info from X as possible
first PC captures the largest share of variance in X, etc
only keeps the top p components, which explain most of the variance
run OLS regression of y on these p PCs instead
reduces dimensionality, avoids multicollinearity and overfitting, while keeping most of the predictive power

SO: MAX var(SUM(aji.Xi)), s.t. Uncorrelated components and normalised

Question 20

Q

Difference between MSFE and MSPE

Answer

Study These Flashcards

A

MSFE is in the context of time series forecasting, is measuring how well a time series model predicts the future

MSPE is more general predictive modelling, measuring how well a regression model predicts NEW observations

Question 21

Q

When to use ridge vs lasso

Answer

Study These Flashcards

A

Ridge is best when predictors are many and highly correlated, but we think most are useful in some way

Lasso is best when we think many predictors are irrelevant and the true model is sparse

Question 22

Q

Why PCR?

Answer

Study These Flashcards

A

Firstly, OLS breaks down when you have lots of predictions, so these strategies are to tame too may predictors so forecasts are stable and accurate

problem with ridge and lasso is not just that there are too many predictors, but is that these predictors are highly correlated
PCR solves this by replacing the predictors with a smaller set of linear combinations of them

Lecture 18 Flashcards

(22 cards)