Lecture 18 Flashcards
Forecast is not a prediction
Forecasting is out of sample - you’re using data up to time T to predict future values like Yt+1, etc
- Y^t+h|T, the forecast of Y at time T + h, using data up to time T
- Y^t+h - y^t+h|T = forecast error
MSFE - Mean Squared Forecast Error
Measures the average squared difference between the actual future value and your forecast
- MSFE = E[(Yt+1 - Y^(t+1|T))^2]
- can break this further into two terms, variance of the error term (oracle MSFE) and the estimation error
RMSFE - Root MSFE
- just the square root of MSFE - interpreted like a typical forecast error, but hard to get as we dont know future values
- if we assume stationarity and no estimation error, we can approximate it with:
RMSFEser = ROOT((SSR)/(T - n - 1)) - both give a numerical handle on forecast accuracy and helps compare models
FPE - Final Prediction Error
Adjust RMSFE to include estimation error, applies only if data is stationary and homoskedstic
- RMSFE^fpe = ROOT((SSR/T).((T + n - 1)/(T - n - 1)))
- T is the number of observations, n is number of predictors
- previous SER version understates forecast error by ignoring estimation uncertainty, this tries to fix that but still relies on strong assumptions
POOS - Pseudo Out-of-Sample explanation
Simulates real-time forecasting by pretending you’re forecasting the future using only the past data
- doesn’t require strong assumptions and captures both estimation and forecast error
- most honest/ realistic forecast evaluation
How POOS works
- Split the sample - use the first 90% of data for model estimation, final 10% for forecasting
- Re-estimate your model each time - for each date s, fit the model using data up to s
- Forecast one step ahead of - Predict Ys+1 using the model fit through s, get Y^(s+1|s)
- Compute forecast error - Ys+1 - Y^(s+1|s)
- Compute the POOS RMSFE : ROOT((1/P).SUM(forecast error squared))
Forecast intervals
If the forecast error is normally distributed, we can build a 95% forecast interval around our prediction:
- y^t+1|t +- 1.96RMSFE^
- here’s the rough range i expect it to fall in
Forecast intervals for transformations
- e.g. TRI(ln(lPt)) as the dependent variable
- Forecast the change in logs, to get percentage change
- Convert to a level forecast, by using the forecasted % change
- Construct forecast intervals, build the forecast interval using the RMSFE of the log change, then apply step 2 to lower and upper bounds to get interval for IPt+1
- what happens if the change isn’t small?
You can forecast in logs, but to interpret in levels be careful
Forecasting oil prices using time series methods
- Model selection so AR, ADL, etc, use tools like BIC or AIC to pick lag length and variables
- Checking for breaks - chow if you know roughly where, QLR to detect them
- Point forecast - forecast log change, to %, then back to actual level
- Forecast levels - use RMSFE, assuming small changes and normality
- Choosing the right RMSFE - SER, FPE, POOS
Prediction in a big data or high-dimensional setting
- in traditional regressions, we usually have fewer predictors than observations k < N
- in big data we may have many predictors, some it’s even more predictors than data points which can make OLS unreliable due to overfit.
Whats an estimation sample and what’s a holdout sample
- Data used to estimate your model
- data not used in estimation - used to test model predictions
MSPE - Mean Squared Prediction Error
MSPEols = (1 + (k/N).o^2)
- more predictors you use, k, worse your out of sample prediction can be, unless you have lots of data, N
How to estimate the MSPE using cross validation
- m-fold cross validation
Simulation of an out-of-sample testing environment
1. Split data into m chunks, each chunk is a test set once and remaining m-1 chunks are training data for that round
2. Loop through all m combinations
- train model on m-1 parts, test on remaining 1 and repeat
3. Compute MSPE
- after all m rounds, you’ve got one prediction for each point, square and average them - estimated MSPE
Why not use SER for model fit?
- SER is in sample, only tells you how well the model fits the data it was trained on
- MSPE is out sample, tells you how well the model predicts new data
How is m fold cross validation like POOS?
- you pretend to be in real time, re estimating and predicting forward
- cross validation does the same, but instead of a time sequence, splits data randomly or sequentially depending on context.
Ridge regression
A regularisation technique used when you have many predictors, maybe more than observations, but at least enough that OLS starts to break down
- Ridge: OLS + k.SUM(bj)^2
- discourages large coefficients
- low k behaves like OLS, high, there is strong shrinkage and coefficients pulled to 0
How to choose k for ridge
- Pick a range of values of k to try
- Use K-fold cross validation, split into K folds, use K-1 folds to estimate model, use the held-out fold to predict and compute MSPE for each
- Average the MSPEs across the 5 folds
- Choose the k with the lowest average MSPE across folds, optimal shrinkage level
Lasso regression
First term is the same, but second term is k.SUM(|bj|)
- forces small coefficients to 0, creating sparsity
- use when you have lots of predictors and you think many may be irrelevant.w
Principal components regression
Instead of selecting variables, PCR transforms them:
- takes linear combinations of the original variables - called PCs
- combinations chosen to maximise variance, capture as much info from X as possible
- only keeps the top p components
- reduces dimensionality, reducing multicollinearity and overfitting
SO: MAX var(SUM(aji.Xi)), s.t. Uncorrelated components and normalised
- then do OLS using these principal components as predictors