Chapter 3: Exploratory Data Analysis for Time Series Flashcards

1
Q

What are the first steps for exploring a new data set?

A
  • Columns that are available.
  • Value ranges and units of variables.
  • Correlations between columns.
  • Overall mean and variance of interesting columns.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are useful visualisations of time series data?

A
  • ECDF’s of raw and differenced time series, total and in groups.
  • Scatterplots of raw time series between groups and scatter plots of differenced data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we test for stationarity?

A

The most common test for stationarity is the Augmented Dicky-Fuller (ADF). This tests the null hypothesis that a unit root is present in a time series. If a series has a unit root to its characteristic equations, it is not stationary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the limitations of hypothesis tests for stationarity?

A
  • These tests have low power in distinguishing near unit roots from unit roots.
  • With low sample size, false positives for unit roots are fairly common.
  • Most tests do not test for or against all kinds of problems that can lead to a non-stationary time series. For example, some times will look specifically to testing whether the mean or the variance (but not both) is stationary. Other tests will look more generally at the overall distribution. It is important to understand the limits of the test applied when using it and to ensure that the limits are consistent with your beliefs about your data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the limitations of hypothesis tests for stationarity?

A
  • These tests have low power in distinguishingnearunit roots from unit roots.
  • With low sample size, false positives for unit roots are fairly common.
  • Most tests do not test for or against all kinds of problems that can lead to a non-stationary time series. For example, some times will look specifically to testing whether the mean or the variance (but not both) is stationary. Other tests will look more generally at the overall distribution. It is important to understand the limits of the test applied when using it and to ensure that the limits are consistent with your beliefs about your data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the importance of stationarity in practice?

A
  • A large number of models assume stationarity.
  • Models of non-stationary time series will vary in accuracy as the metrics of the time series vary. For example, if your model is used to estimate the mean and variance of the time series with non-stationary mean and variance, the bias and error of the model will vary over time, at which point the utility of the model becomes questionable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can a non-stationary dataset be made stationary?

A

The following transformations:
- Log transformation.
- Square root transformation.
- Differencing to remove trend (if the series is not stationary after second order differencing, it is unlikely further differencing will fix it).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What other assumptions do time series models make about the data?

A

The input and predicted variables are normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the assumptions of the output that are being made with transformations like log or sqrt?

A
  • The data will always be positive.
  • If you choose to shift data before transforming, a bias will be added or you are assuming it doesn’t matter.
  • These transformations make larger values less different from each other, effectively compressing the space between the larger values but not the smaller values, de-emphasising the difference between outliers. This may or may not be appropriate.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a window function?

A

A window function is any sort of function where you aggregate data to compress it (downsampling) or smooth it. These make for very informative explorative vizualisations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an expanding window?

A

An expanding window starts with a given minimum size but as you progress into the time series, it grows to include every point up to a given time rather than only a finite and constant size. These are only useful for generating summary statistics if the time series is assumed to be stationary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are example of expanding window fuctions?

A
  • cumsum
  • cummin
  • cummax
  • cummean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between a rolling and expanding window?

A

Expanding window functions will be the global value of what function you’re using up to the point its calculated e.g. an expanding window mean will be the global mean of the time series are the timestamp, rather than an a local mean when using a rolling mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is self correlation?

A

The general idea of self correlation of a time series is the idea that a value in a time series at one given point in time is correlated to the the value at another point in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an example of self correlation?

A

As an example of self correlation, if you take a yearly time series of daily temperature data, you may find that comparing May 15th of every year to August 15th of every year will give you some correlation, such that hotter May 15ths tend to correlate with hotter August 15ths (or tend to correlate with cooler August 15ths). You may feel you have learned a potentially interesting fact about the temperature system, indicating that there is a certain amount of long-term predictability. On the other hand, you may find the correlation closer to zero, in which case you will also have found something interesting, namely that knowing the temperature on May 15th does not alone give you any information about the likely range of temperatures on August 15th. That is self-correlation in an anecdotal nutshell.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is autocorrelation?

A

Autocorrelation gives you an idea of how data points at different points in time are linearly related to one another as a function of their time difference.

16
Q

What is the Partial Autocorrelation Function?

A

The partial autocorrelation function of a time series for a given lag is the partial correlation of the time series with itself at that lag given all the information between two points in time.

17
Q

Why is the PACF useful?

A

For a seasonal and noiseless process, ACF values will be seen at T, 2T, 3T to infinity. The PAFC will weed out these redundant correlation and reveals which correlations are “true” informative correlations for specific lags rather than redundancies.