Chapter 2: Finding and Wrangling Time Series Data Flashcards
What is a ‘look ahead’ and how can it occur in time series data?
A lookahead is any way that information about what will happen in the future might propagate back in time in your modeling and affect how your model behaves earlier in time. For example, when smoothing a noisey time series, a future value could be incorporated in to the smoothed time series.
What is a general checklist when approaching time series data?
- Lookahead check: If you are smoothing data or imputing missing data, think carefully about whether it might impact your results by introducing a lookahead. And don’t just think about it—experiment as we did earlier and see how the imputations and smoothing work. Do they seem to be forward looking? If so, can you justify using them? (Probably not.)
- Experiment with smaller data: Build your entire process with a very small data set (only a few rows in a data.table or a few row time steps in whatever data format). Then, do random spot checks at each step in the process and see whether you accidentally shift any information temporally to an inappropriate place.
- Time stamp sanity check: For each kind of data, find out what the lag is for it relative to its own timestamp. For example, if the timestamp is when the data “happened” but not when it was uploaded to your servers, you need to know that. Different columns of a data frame may have different lags. To address this, you can either customize your lag per data frame or (better and more realistic) pick the biggest lag and apply that to everything. While you won’t want to unduly pessimize your model, it’s a good starting point after which you can relax these overly constrained rules one at a time, carefully!
- Time aware error and cross validation: Use time-aware error (rolling) testing or cross-validation. This will be discussed in Chapter11, but remember that randomizing your training versus testing data sets does not work with time series data. You do not want information from the future to leak into models for the past.
- Test model performance with intentional lookahead: Intentionally introduce a lookahead and see how your model behaves. Try various degrees of lookahead, so you have an idea how it shifts accuracy. If you have some idea of the accuracy with lookahead, you have an idea of what the ceiling on a real model without unfair knowledge of the future will do. Remember that many time series problems are extremely difficult, so a model with a lookahead may seem great until you realize you are dealing with a high-noise/low-signal data set.
- Experiment with features iteratively: Add features slowly, particularly features you might be processing, so that you can look for jumps. One sign of a lookahead is when a particular feature is unexpectedly good, and there isn’t a very good explanation. At the top of your explanation list should always be “lookahead.”
What is meant by stationarity of a time series?
A stationary time series is one that has fairly stable statistical properties over time, particularly with respect to the mean and variance.
What is meant by seasonality?
Seasonality in data is any kind of recurring behaviour in which the frequency of the behaviour is stable. For example, human behaviour tends to have a daily seasonality (lunch time between 12:00 - 13:00 everyday).