Chapter 3: Exploratory Data Analysis for Time Series Flashcards
What are the first steps for exploring a new data set?
- Columns that are available.
- Value ranges and units of variables.
- Correlations between columns.
- Overall mean and variance of interesting columns.
What are useful visualisations of time series data?
- ECDF’s of raw and differenced time series, total and in groups.
- Scatterplots of raw time series between groups and scatter plots of differenced data.
How do we test for stationarity?
The most common test for stationarity is the Augmented Dicky-Fuller (ADF). This tests the null hypothesis that a unit root is present in a time series. If a series has a unit root to its characteristic equations, it is not stationary.
What are the limitations of hypothesis tests for stationarity?
- These tests have low power in distinguishing near unit roots from unit roots.
- With low sample size, false positives for unit roots are fairly common.
- Most tests do not test for or against all kinds of problems that can lead to a non-stationary time series. For example, some times will look specifically to testing whether the mean or the variance (but not both) is stationary. Other tests will look more generally at the overall distribution. It is important to understand the limits of the test applied when using it and to ensure that the limits are consistent with your beliefs about your data.
What are the limitations of hypothesis tests for stationarity?
- These tests have low power in distinguishingnearunit roots from unit roots.
- With low sample size, false positives for unit roots are fairly common.
- Most tests do not test for or against all kinds of problems that can lead to a non-stationary time series. For example, some times will look specifically to testing whether the mean or the variance (but not both) is stationary. Other tests will look more generally at the overall distribution. It is important to understand the limits of the test applied when using it and to ensure that the limits are consistent with your beliefs about your data.
What is the importance of stationarity in practice?
- A large number of models assume stationarity.
- Models of non-stationary time series will vary in accuracy as the metrics of the time series vary. For example, if your model is used to estimate the mean and variance of the time series with non-stationary mean and variance, the bias and error of the model will vary over time, at which point the utility of the model becomes questionable.
How can a non-stationary dataset be made stationary?
The following transformations:
- Log transformation.
- Square root transformation.
- Differencing to remove trend (if the series is not stationary after second order differencing, it is unlikely further differencing will fix it).
What other assumptions do time series models make about the data?
The input and predicted variables are normally distributed.
What are the assumptions of the output that are being made with transformations like log or sqrt?
- The data will always be positive.
- If you choose to shift data before transforming, a bias will be added or you are assuming it doesn’t matter.
- These transformations make larger values less different from each other, effectively compressing the space between the larger values but not the smaller values, de-emphasising the difference between outliers. This may or may not be appropriate.
What is a window function?
A window function is any sort of function where you aggregate data to compress it (downsampling) or smooth it. These make for very informative explorative vizualisations.
What is an expanding window?
An expanding window starts with a given minimum size but as you progress into the time series, it grows to include every point up to a given time rather than only a finite and constant size. These are only useful for generating summary statistics if the time series is assumed to be stationary.
What are example of expanding window fuctions?
- cumsum
- cummin
- cummax
- cummean
What is the difference between a rolling and expanding window?
Expanding window functions will be the global value of what function you’re using up to the point its calculated e.g. an expanding window mean will be the global mean of the time series are the timestamp, rather than an a local mean when using a rolling mean.
What is self correlation?
The general idea of self correlation of a time series is the idea that a value in a time series at one given point in time is correlated to the the value at another point in time.
What is an example of self correlation?
As an example of self correlation, if you take a yearly time series of daily temperature data, you may find that comparing May 15th of every year to August 15th of every year will give you some correlation, such that hotter May 15ths tend to correlate with hotter August 15ths (or tend to correlate with cooler August 15ths). You may feel you have learned a potentially interesting fact about the temperature system, indicating that there is a certain amount of long-term predictability. On the other hand, you may find the correlation closer to zero, in which case you will also have found something interesting, namely that knowing the temperature on May 15th does not alone give you any information about the likely range of temperatures on August 15th. That is self-correlation in an anecdotal nutshell.