Lecture 4 - Time Series Forecasting Flashcards
Define time series forecasting
- technique that tries to predict how a sequence of historical data will continue in the future, by analysing the data and identifying patterns/trends in the series
What ways can you identify characteristics in time series data
- visualisation
- identify patterns, possible explanations for variation in the data
- statistical analysis
- for time series –> time plot
- for seasonal time series –> seasonal plot
–> both can reveal trends, or seasonal behaviour
Define a time plot
*how can this help in selecting an appropriate time series forecasting approach?
depicts the overall trend of the data points across the entire time period.
- identify trends
Define a seasonal plot
*how can this help in selecting an appropriate time series forecasting approach?
revealing recurring patterns within the data that occur over specific time intervals (seasons)
*recognise seasonality patterns
Steps of time series forecasting
- Determine time horizon
- Gather and analyse data
*Select and validate forecasting model to use
- Make forecast
- Monitor and control forecast
What is a stationary series
- a data sequence which has no strong trend or seasonal component
- we assume it is essentially constant over the long term with short term fluctuations
What forecasting approaches are used for stationary series
- Simple Moving Average
- Exponential Smoothing
Key points about simple moving average
- Only user input choice is the length of the moving average
–> short average provides more response to changing demand levels, but may not be desirable
–> long average provides more smoothing, but may miss trends and turning points
For a 3 week simple moving average, where does the first moving average point go
In the ‘3 week moving average’ column, in line with the 4th week’s data
Define bias in time series forecasting
a forecast is biased if they consistently overestimate or underestimate values
Why is average error not a good measurement of forecast accuracy (error and bias)
Because negative and positive errors will cancel out
Errors made consistently in one direction imply ….. what?
bias
Three common measures of TIME SERIES forecast accuracy
*Mean Absolute Deviation (MAD)
*Mean Squared Deviation/Error (MSD / MSE)
*Mean Absolute Percentage Error (MAPE)
What is MAD
*Mean Absolute Deviation
*measures the average of the absolute differences between forecasted values and actual values
How is ‘Mean Absolute Deviation’ calculated
𝗠𝗔𝗗 = Σ | 𝗙𝗼𝗿𝗲𝗰𝗮𝘀𝘁 - 𝗔𝗰𝘁𝘂𝗮𝗹 | / 𝗻
with Σ (sigma) representing the sum over all n data points.
&
|…| representing absolute value
Strengths and weaknesses of MAD as a metric to measure error and bias
Strength
* easy to interpret as in sameΣ | (Forecast - Actual) / Actual | * 100% / n scale with the data
* less sensitive to outliers than MSE
Weakness
*doesn’t consider magnitude of errors (as mean treats all errors equally)
What is MSD/MSE
*Mean Squared Deviation/Error
*measures the average of the squared differences between the forecasted values and the actual values
- squaring the differences gives more weight to larger errors
How is ‘Mean Squared Deviation/Error’ calculated
Σ (Forecast - Actual)² / n
Strengths and weaknesses of MSD/MSE
Strengths
*more sensitive to larger errors (due to squaring)
Weaknesses
* results can be skewed by outliers
*can be difficult to interpret error without converting back to original scale of data
What is MAPE
- Mean Absolute Percentage Error
- measures the average of the absolute percentage errors
How is ‘Mean Absolute Percentage Error’ calculated
Σ | (Forecast - Actual) / Actual | x 100% / n
Strengths and Weaknesses of MAPE
Strengths
*easily interpreted as a %
–> therefore useful for comparison across data sets 𝘄𝗶𝘁𝗵 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁 𝘀𝗰𝗮𝗹𝗲𝘀.
Weaknesses
* can be misleading when demand levels are very low (close to 0 would mean dividing by 0’s)
* sensitive to outliers
What is exponential smoothing
forecasting technique that assigns exponentially decreasing weights to past observations when generating forecasts
- assigns higher weights to more recent data points and lower weights to older ones, therefore giving them less influence on the forecast.
A higher α (closer to 1) …..?
*less weight on past observations
*more responsive curve to changes in data