Lecture 18 Flashcards

(22 cards)

1
Q

Forecast is not a prediction

A

Forecasting is out of sample - you’re using data up to time T to predict future values like Yt+1, etc
- Y^t+h|T, the forecast of Y at time T + h, using data up to time T
- Y^t+h - y^t+h|T = forecast error
- forecasts can be one-step ahead, or multi-steps ahead

Forecasting is about using past data to -predict the future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

MSFE - Mean Squared Forecast Error

A

Measures the average squared difference between the actual future value and your forecast
- MSFE = E[(Yt+1 - Y^(t+1|T))^2]
- squaring errors penalises big mistakes more than small ones

Decomposition shows two sources of forecast error:
1. Oracle error - due to unpredictable shocks ut+1
2. Estimation error, as we estimated coefficients, and they deviate from true ones

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

RMSFE - Root MSFE

A
  • just the square root of MSFE - interpreted like a typical forecast error, but hard to get as we dont know future values, YT+1, etc
  • if we assume stationarity and no estimation error, we can approximate it with:
    RMSFEser = ROOT((SSR)/(T - n - 1))
  • if data is stationary, forecast errors have mean zero, and RMSFE can be estimated using the regression’s residual variance, ignores estimation error, but if sample size large relative to predictors, that’s often okay
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

FPE - Final Prediction Error

A

Adjust RMSFE to include estimation error, applies only if data is stationary and homoskedstic
- RMSFE^fpe = ROOT((SSR/T).((T + n - 1)/(T - n - 1)))
- T is the number of observations, n is number of predictors
- previous SER version understates forecast error by ignoring estimation uncertainty, this tries to fix that but still relies on strong assumptions, like errors being homoskedasticity, model being stationary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

POOS - Pseudo Out-of-Sample explanation

A
  • doesn’t require strong assumptions and captures both estimation and forecast error
  • most honest/ realistic forecast evaluation

Avoids unrealistic assumptions of SER and FPE, mimics real forecasting conditions, so at each date, you only use information that would have been available then, and by re-estimating the model each time, it naturally incorporates estimation error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How POOS works

A
  1. Split the sample - use the first 90% of data for model estimation, final 10% for forecasting
  2. Re-estimate your model each time - for each date s, fit the model using data up to s
  3. Forecast one step ahead of - Predict Ys+1 using the model fit through s, get Y^(s+1|s)
  4. Compute forecast error - Ys+1 - Y^(s+1|s)
  5. Compute the POOS RMSFE : ROOT((1/P).SUM(forecast error squared))
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Forecast intervals
- key points

A

If the forecast error is normally distributed, we can build a 95% forecast interval around our prediction:
- y^t+1|t +- 1.96RMSFE^

  1. This is NOT a CI, here YT+1 is a future random variable, so we’re capturing outcome uncertainty not parameter uncertainty
  2. Strictly valid only if uT+1 is normal, but in practise approximation often works reasonably well too
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Forecast intervals for transformations
- e.g. TRI(ln(lPt)) as the dependent variable

Often in time series, we model transformations rather than raw levels, but what if we want the forecast in levels rather than the growth rates

A
  1. Forecast the change in logs, to get percentage change
  2. Convert this forecast back to levels
  3. Use RMSFE to build a forecast interval for the transformation regression, then convert bounds back into levels mode using step 2

To convert:
IPt^ + 1 = IPt.(1+change(IPt^)+1), but if % changes are not normal, then correct for the variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Forecasting oil prices using time series methods

A
  1. Model selection so AR, ADL, etc, use tools like BIC or AIC to pick lag length and variables
  2. Checking for breaks - chow if you know roughly where, QLR to detect them
  3. Point forecast - forecast log change, to %, then back to actual level
  4. Forecast levels - use RMSFE, assuming small changes and normality
  5. Choosing the right RMSFE - SER, FPE, POOS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Prediction in a big data or high-dimensional setting

A
  • in traditional regressions, we usually have fewer predictors than observations k < N
  • in big data we may have many predictors, some it’s even more predictors than data points which can make OLS unreliable due to overfit.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Whats an estimation sample and what’s a holdout sample
- formalised predictive regression setup - MLR

A

Estimation Sample: data used to estimate/fit your model
Holdout Sample: used for out of sample evaluation, crucial for forecasting, as in-sample fit doesn’t tell us about predictive power

assume that holdout sample comes from same distribution as estimation sample, otherwise OOS performance isn’t meaningful
- can get MSPE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

MSPE - Mean Squared Prediction Error

A

MSPEols = (1 + (k/N).o^2) under homoskedasticity
- more predictors you use, k, worse your out of sample prediction can be, unless you have lots of data, N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to estimate the MSPE using cross validation
- m-fold cross validation

A

Simulation of an out-of-sample testing environment
1. Split data into m chunks
2. For each chunk, estimate the model on (1-1/m).N observations, then predict the remaining N/m observations
3. Rotate through all m folds so each observation is used once for testing
4. Compute prediction errors for all test predictions, average them to get your MSPE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why not use SER for model fit?

A
  • SER is in sample, only tells you how well the model fits the data it was trained on
  • MSPE is out sample, tells you how well the model predicts new data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is m fold cross validation like POOS?

A
  • you pretend to be in real time, re estimating and predicting forward
  • cross validation does the same, but instead of a time sequence, splits data randomly or sequentially depending on context.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Ridge regression

A

A regularisation technique used when you have many predictors, maybe more than observations, but at least enough that OLS starts to break down
- Ridge: OLS + k.SUM(bj)^2
- second term is the penalty, proportional to the sum of squared coefficients.
- if k=0, SSR, if K is large, coefficients shrink to 0, k is chosen as the value which minimises MSPE via cross-validation
- discourages large coefficients
- low k behaves like OLS, high, there is strong shrinkage and coefficients pulled to 0

17
Q

How to choose k for ridge

A
  1. Pick a range of values of k to try
  2. Use K-fold cross validation, split into K folds, use K-1 folds to estimate model, use the held-out fold to predict and compute MSPE for each
  3. Average the MSPEs across the 5 folds
  4. Choose the k with the lowest average MSPE across folds, optimal shrinkage level
18
Q

Lasso regression

A

First term is the same, but second term is k.SUM(|bj|)
- forces small coefficients to 0, creating sparsity
- use when you have lots of predictors and you think many may be irrelevant
- with ridge, all coefficients are shrunk smoothly towards 0, but never exact
- with lasso, due to penalty, optimisation geometry means some coefficients are forced to be exactly 0

19
Q

Principal components regression

A

Instead of selecting variables, PCR transforms them:

  • takes linear combinations of the original variables - called PCs
  • combinations chosen to maximise variance, capture as much info from X as possible
  • first PC captures the largest share of variance in X, etc
  • only keeps the top p components, which explain most of the variance
  • run OLS regression of y on these p PCs instead
  • reduces dimensionality, avoids multicollinearity and overfitting, while keeping most of the predictive power

SO: MAX var(SUM(aji.Xi)), s.t. Uncorrelated components and normalised

20
Q

Difference between MSFE and MSPE

A

MSFE is in the context of time series forecasting, is measuring how well a time series model predicts the future

MSPE is more general predictive modelling, measuring how well a regression model predicts NEW observations

21
Q

When to use ridge vs lasso

A

Ridge is best when predictors are many and highly correlated, but we think most are useful in some way

Lasso is best when we think many predictors are irrelevant and the true model is sparse

22
Q

Why PCR?

A

Firstly, OLS breaks down when you have lots of predictions, so these strategies are to tame too may predictors so forecasts are stable and accurate

  • problem with ridge and lasso is not just that there are too many predictors, but is that these predictors are highly correlated
  • PCR solves this by replacing the predictors with a smaller set of linear combinations of them