Lecture 18 Flashcards

1
Q

Forecast is not a prediction

A

Forecasting is out of sample - you’re using data up to time T to predict future values like Yt+1, etc
- Y^t+h|T, the forecast of Y at time T + h, using data up to time T
- Y^t+h - y^t+h|T = forecast error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

MSFE - Mean Squared Forecast Error

A

Measures the average squared difference between the actual future value and your forecast
- MSFE = E[(Yt+1 - Y^(t+1|T))^2]
- can break this further into two terms, variance of the error term (oracle MSFE) and the estimation error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

RMSFE - Root MSFE

A
  • just the square root of MSFE - interpreted like a typical forecast error, but hard to get as we dont know future values
  • if we assume stationarity and no estimation error, we can approximate it with:
    RMSFEser = ROOT((SSR)/(T - n - 1))
  • both give a numerical handle on forecast accuracy and helps compare models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

FPE - Final Prediction Error

A

Adjust RMSFE to include estimation error, applies only if data is stationary and homoskedstic
- RMSFE^fpe = ROOT((SSR/T).((T + n - 1)/(T - n - 1)))
- T is the number of observations, n is number of predictors
- previous SER version understates forecast error by ignoring estimation uncertainty, this tries to fix that but still relies on strong assumptions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

POOS - Pseudo Out-of-Sample explanation

A

Simulates real-time forecasting by pretending you’re forecasting the future using only the past data
- doesn’t require strong assumptions and captures both estimation and forecast error
- most honest/ realistic forecast evaluation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How POOS works

A
  1. Split the sample - use the first 90% of data for model estimation, final 10% for forecasting
  2. Re-estimate your model each time - for each date s, fit the model using data up to s
  3. Forecast one step ahead of - Predict Ys+1 using the model fit through s, get Y^(s+1|s)
  4. Compute forecast error - Ys+1 - Y^(s+1|s)
  5. Compute the POOS RMSFE : ROOT((1/P).SUM(forecast error squared))
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Forecast intervals

A

If the forecast error is normally distributed, we can build a 95% forecast interval around our prediction:
- y^t+1|t +- 1.96RMSFE^
- here’s the rough range i expect it to fall in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Forecast intervals for transformations
- e.g. TRI(ln(lPt)) as the dependent variable

A
  1. Forecast the change in logs, to get percentage change
  2. Convert to a level forecast, by using the forecasted % change
  3. Construct forecast intervals, build the forecast interval using the RMSFE of the log change, then apply step 2 to lower and upper bounds to get interval for IPt+1
    - what happens if the change isn’t small?
    You can forecast in logs, but to interpret in levels be careful
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Forecasting oil prices using time series methods

A
  1. Model selection so AR, ADL, etc, use tools like BIC or AIC to pick lag length and variables
  2. Checking for breaks - chow if you know roughly where, QLR to detect them
  3. Point forecast - forecast log change, to %, then back to actual level
  4. Forecast levels - use RMSFE, assuming small changes and normality
  5. Choosing the right RMSFE - SER, FPE, POOS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Prediction in a big data or high-dimensional setting

A
  • in traditional regressions, we usually have fewer predictors than observations k < N
  • in big data we may have many predictors, some it’s even more predictors than data points which can make OLS unreliable due to overfit.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Whats an estimation sample and what’s a holdout sample

A
  • Data used to estimate your model
  • data not used in estimation - used to test model predictions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

MSPE - Mean Squared Prediction Error

A

MSPEols = (1 + (k/N).o^2)
- more predictors you use, k, worse your out of sample prediction can be, unless you have lots of data, N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to estimate the MSPE using cross validation
- m-fold cross validation

A

Simulation of an out-of-sample testing environment
1. Split data into m chunks, each chunk is a test set once and remaining m-1 chunks are training data for that round
2. Loop through all m combinations
- train model on m-1 parts, test on remaining 1 and repeat
3. Compute MSPE
- after all m rounds, you’ve got one prediction for each point, square and average them - estimated MSPE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why not use SER for model fit?

A
  • SER is in sample, only tells you how well the model fits the data it was trained on
  • MSPE is out sample, tells you how well the model predicts new data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is m fold cross validation like POOS?

A
  • you pretend to be in real time, re estimating and predicting forward
  • cross validation does the same, but instead of a time sequence, splits data randomly or sequentially depending on context.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Ridge regression

A

A regularisation technique used when you have many predictors, maybe more than observations, but at least enough that OLS starts to break down
- Ridge: OLS + k.SUM(bj)^2
- discourages large coefficients
- low k behaves like OLS, high, there is strong shrinkage and coefficients pulled to 0

17
Q

How to choose k for ridge

A
  1. Pick a range of values of k to try
  2. Use K-fold cross validation, split into K folds, use K-1 folds to estimate model, use the held-out fold to predict and compute MSPE for each
  3. Average the MSPEs across the 5 folds
  4. Choose the k with the lowest average MSPE across folds, optimal shrinkage level
18
Q

Lasso regression

A

First term is the same, but second term is k.SUM(|bj|)
- forces small coefficients to 0, creating sparsity
- use when you have lots of predictors and you think many may be irrelevant.w

19
Q

Principal components regression

A

Instead of selecting variables, PCR transforms them:
- takes linear combinations of the original variables - called PCs
- combinations chosen to maximise variance, capture as much info from X as possible
- only keeps the top p components
- reduces dimensionality, reducing multicollinearity and overfitting
SO: MAX var(SUM(aji.Xi)), s.t. Uncorrelated components and normalised
- then do OLS using these principal components as predictors