Time series Flashcards

1
Q

define a time series {r_t}

A

A sequence of random variables over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what does the models of AR, MA, and ARMA sort of “do”?

A

They attempt to make use of whatever information they have prior to some point, and the goal is to make a prediction by utilizing the linear relationship between the prior information and the prediction point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the ultimate focus on linear time series analysos?

A

the aspect of correlation. In time series, it is referred to as autocorrelation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the foundation of time series analysis?

A

A concept called stationarity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define stationarity

A

The book differ between strict and weak stationarity.

Strict stationarity: A time series {r_t} is said to be strictly stationariy if the joint distribution (r_t1, r_t2, …, r_tk) and the joint distribution that has the same variables, but shifted by t, are the same. Same distribution, same mean, same variance all identical.

Weak stationarity: A time series {r_t} is said to be weakly stationary if both the mean of r_t and the covariance of r_t and r_{t-l} is time invariant. This means that E[r_t] = mu, and cov(r_t, r_{t-l}) = gamm_l, which depend only on l.
Make no mistake: This notaiton entail that each r_t in the time series {r_t} has the same mean. this makes sure that the trend is flat around this mean value.

Also take time to consider the covariance requirement. It say that the covaraince between r_t, which is just a random variable in the time series, and its lag-l corresponding random variable, is the same regardless of what r_t is. all that matters, is the lag-l. Thus, the covariance between each consecutive random variable in the time series must be the same, and the same goes for each second consecutive etc and so on. This is basically saying that the same relationship among all variables must remain true through the entire time series. Note that it is still very plausible that the covariance is close to 0 for many of the cases, which is totally fine. This is ultimately a part of what we want to figure out with time series analysis.

From the requirement of cov() = constant, we also get that the variance of each random variable in the time series must be the same.

Note as well what the cov() requirement does to seasonality. It enforce that there is no seasonality because of for instance how lag-1 covariance being constant for all consecutive random variables means that we cant get those seasonal swings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is lag-l autocovariance

A

Facny word for the correlation between two random variables separated by a lag of “l” in a time series.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are we actually considering with {r_t}+

A

I believe a “true” time series with infinite random variables. I believe this is the case because of how the book describe samples of a time series as {r_t}^T_{t=0}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how do we denote the lag-l autocorrelation of r_t?

A

p_l

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is portmanteau and Ljung-box used for?

A

testing to see whether multiple differnet lag-l autocorrelations are 0 or nto

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

define white noise

A

White noise is a time series consisting of iid random variables with finite mean and variance.

It is typical to use white noise with mean zero and variance equal to sigma^2, along with normal distribution. Then it is called gaussian white noise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what can we say about the ACF’s of white noise series+

A

all are zero because of how they are iid.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is actually being defined here?

A

Here we define what it would entail for the UNDERLYING PROCESS that governs the time series data generation to be linear. It is not related to a linear model.

it simply say that IF the time series is inherently linear, its data generation process will follow that formula.

This means that if we assume that some time series is linear, we basically assume that its data generation process can be described using that formula.

On the specifics, the assumption is that a linear time series follow a sum of weights and white noise. This indicate that the next data point is a function of all the previous movements multiplied by some weight.
Also, do not neglect the mu. This is the mean. The white noise series is more related to representing some kind of relationship regarding “what happens next if the shock was this”. we know that the mean is mu on aggregated metrics, but our goal is to relate the shocks to future movement in order to make our predicitons. We’re essentially looking to analyze how the time series react to certain shocks.

The more interesting part is how past values are not included. We’re only using the mean and how the time series react to shocks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

AR models depend on r_(t-i) and not just the white noise. Can it then be defined as linear?

A

Yes, because we can rewrite the model (using the expected value for mu etc) to get:

r_t - mu = ø1 (r_{t-1} - mu) + a_t

and if we repeat the substitutions, we get that the result is linear, where the weights are powered increasingly. Since the ø’s must be less than 1 in absolute value, this sum will diminish.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

in the book, they do this for all models. They say that “for a weakly stationary AR(1) model….”. what exactly are they referring to? What is weakly stationary?

A

The model in its abstract form represent a process of data generation. The idea is that if the underlying process that we wish to describe with a model is stationary, we need our model to inhibit the same stationary features. Because of this, it is important to us that we know our model do this.

Stationarity is a baseline for time series because without stationarity in the data process that generate the time series in the first place, there is nothing to predict. THere will be no way to use a time series model to forecast. Because of this, we place great emphasis on the fact that our time sereis are at least weakly stationary.
And due to the fact that our time series are weakly stationary, we need a model that capture this. And the model should capture it with the same values as well, but this enters into the world of estimation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is meant by “autocovariance”

A

Autocovariance determine how some random variable relate to its previous values. Covariance of the same random variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is the interpretation of autocovariance/covariance?

A

Nothing really. This is why we use correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

express corrlation using only gamma

A

correlation is defined as covariance divided by the product of standard deviation. Since standard deviation is the same regardless of which random variable we look at in the time series, we know that this is just the variance. thus we get “covariance/variance” and for some specific lag we simply take the covariance (auto) and divide on the variance. gamma_l / gamma_0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is the autocorrelation function?

A

the graph we get from plottign the correlation values for each lag.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is the key regarding white noise series

A

the covariance (autocovariance) between different variables in the white noise series is always 0 unless we take the covariance with itself, which is the variance, which is sigma^2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what does th autocorrelation function of white noise look like?

A

A single peak of 1 at the lag 0, all 0 otherwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is the distribution of a white noise autocorrelation function?

A

Note that this applies only for the sample autocorrelation function of a gaussian white noise series.

Normal with mean 0 and variance 1/T.

It will approximate zero variance for larger time series.

This means that while we should expect a value of 0 for hte autocorrelation, it will swing/fluctuate with a variance of 1/T.

We use this in hypothesis testing to see if something is basically white noise or not. If we use a t-ratio test, we’re cehcking the hypothesis that the mean is indeed 0. If it turns our that the observed values are CRITICALLY EXTREME for the t-ratio statistic, we know that the observed autocorrelation is likely not a result of a white noise series, because it indicates that there is significant autocorrelation in our sample for the specific lag we’re checking (autocorrelation is specified on a par-lag basis).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

how can we easily test if something is white noise? Say we have an autocorrelation funciton, and we want to establish bands indicating signifance?

A

Easiest ir pobably to set up confidence interval like this:

+- 1.96 x 1/sqrt(T)

this gives us the limits for 95% confidence.

23
Q

for MA models, we typically neglect the constant mu term. Why?

A

It considerably eases computaitons, and at no loss of generality. This is because we can obtain a zero-mean series by simply subtracting the mean, shifting all values/observations.

24
Q

how does a q’th order MA model look like+

25
Q

what is the mean of an MA(q) series?

26
Q

what is the variance of an MA(q) series?

A

variance of mu is zero, so it removes.
so we have the variance of a sum of white noise variables multiplied by their own theta value.
The variance of theta part becomes a squared contribution, and the variance of the white noise random variables is sigma^2. So we get:

var(MA(q)) = (1+theta^2_1 + theta^2_2 …+theta^2_q)sigma^2

27
Q

what can we say about the covariance of MA(q)?

A

for s>q, the covariance is 0.

up to q, the covariance may or may not be zero.

28
Q

An MA process produce some variable value based on shocks only. But the shocks are random and ocnsidered white noise, how can we predict anything based on this?

A

First of all, the models doesnt only rely on the white noise. We also have the mean component. As a result, a simple MA models is basically trying to see if there is a systematic way the time series move around its mean based on the specific shocks.

For a true MA process, the time series will fluctuate around a mean, and these fluctuations would be completely determined by the shock. The shocks are considered random, but the reaction to it is not.

The key is that the assumption when using MA models is that the variable we’re intersted in will behave in a certain way as a reaction to the shocks. By using MA we’re trying to learn the reaction.

Also, it helps to consider the MA as a process. A process where a series of white noise together influence the reaction of the time series variable of interest. If a time series is a true MA process, we can capture it fully by an MA model.

29
Q

why is acf 0 for MA for lags s>q when the MA is MA(q)?

A

Here we are considfering a TRUE MA process. A true MA process use exactly q random white noise variables to produce the output.

he reason why the autocorrelation function of MA(q) models cuts of at s>q lags is because an MA process use (by definition) exactly q variables to produce the reaction/output variable value. If the underlying process only use the latest q white noise shocks to determine the next step in our time series, then there is obviously no correlation to earlier lags. If there happened to be correlation to earlier lags, then it would simply not be a true MA(q) process, and’d have to increase q to compensate for this.

30
Q

define an AR(p) process

A

autoregressive process where the time series variable is determined completely by the p latest variable values.

31
Q

define an AR(p) model

A

A model that builds on the assumption that the past p variable values determine the next variable value, along with a single white noise term.

32
Q

Does an AR(1) model depend on more shocks than a_t?

A

if we consider an AR(1) process: it can be written as r_t = ø_0 + ø_1r_(t-1) + a_t
Then we can apply recursively and get: r_t = ø_0 + ø_1(ø_0+ø_1r_(t-2)+a_(t-1))+a_t and so on, right?
this creates a scenario where actually, modeling r_t as a process that only use 1 lag entail that we can consider it as a function of the shocks and the coefficients ø_0 and ø_1, which will create an infinite sequence that ultimately converge to a fixed point given that certain requirements for ø_0 and ø_1 are met.

So yes, all AR models depend on all previous shocks.

The key is really “convergence” of the sums that emerge.

33
Q

elaborate on the back operator

A

L, lag operator. L represent a single lag.

L^x represent x lags. L is used in conjunction with r_t, y_t.
For isntance: Ly_t is actually just y_{t-1}, one lag.

L^10 y_t is simply y_{t-10}

34
Q

elaborate on this formula

A

We have the mean, which is alos typically used as ø_0.

then we have a sum of p terms, one per order of the AR(p) model we are considering.

Each term in the sum consist of the corresponding parameter ø_i, and the lagged variable. The lagged variable is simply represented as L^k y_t. this would be y_{t-k}.

The benefit of using the lag operator is that we can separate y_t from the equation. Then we remain with L^k which is a specific function.

35
Q

elaborate on this formula

A

This formula is the simple AR(p) model, but it is heavily compact.

We have defiend a function ø(L), which is a function of ø_i params and back operators/lag operators.

ø(L) = (1 - ø_1L - ø_2L^2 - …. - ø_p L^p)

The formula have simply moved the orgiginal RHS sum to the left hand side, and extracted the common y_t using the power of the lag variables.

Of course, separating y_t allow us to isolate ø(L), which we refer to as the characteristic equaiton.

36
Q

elaborate on finding the characteristic equation

A

We start by the dense form of the AR(p) model:

ø(L) y_t = mu + u_t

Setting my = 0 by performing a shift in all the variables we observe (so that we dont lose generality), we get:

ø(L) y_t = u_t

this means, we also have:

y_t = (ø(L))^(-1) u_t

y_t is equal to the inverse of the function ø(L) multiplied by u_t. And, we know that the mean must be zero, so we naturally want y_t to approximate zero. Therefore, the inverse of ø(L) must converge to zero.
This will happen if the impact that earlier variables have on the current variable will diminish as we go further and further back.

37
Q

important card.

elaborate on values for phi in AR models

A

Again it is important to understadn the process. If some process is AR(1) and phi_1 is larger than 1, then it will simply explode. It will explode because the process will generate new numbers that are bigger and bigger and basically just grow to infinity.

The idea is that if we let the time series continue, what can we say about its heading. For stationarity, we want it to remain flat around some mean.

38
Q

what is the mean of an AR model?

A

Not the first phi value at least. we need to consider the convergence.

the key is to understand that the mean is constant for a stationary process, which means that we can use the property that the mean of y_t is the same as the mean of Ly_t which is the same as the mea nof L^2 y_t and so on. This allows us to separate/isolate an expression for the expectation.

39
Q

why cant we just check the phi values of the AR(p) model to see whehter it will converge or not?

A

Because for orders higher than 1, it is a recurrence relation that consist of more than 1 phi, and then the relationship is not that straight forward. The recurrence can converge without both phis being less than 1. Stationarity depends on the combination.

40
Q

how do we check whether AR model is stationary or not

A

solve characteristic polynomial. Roots must be outside of unity.

Behind this is math surrounding concepts:
- Recurrence relations
- eigenvalues

41
Q

is the process y_t = y_{t-1}+u_t stationariy?

A

No, it is actually known as a random walk.

formally, we can check it:

y_t (1-øL) = u_t

1/(1-øL) is the inverse
1-øL = 0is the characteristic eq.

1 = øL
L = 1/ø
L = 1/1 = 1

Root is 1. 1 is not outside unity, which means that it is non-stationary.

42
Q

what is yule walker?

A

Yule-Walker is a set of equations that can be solved as a system of equations. It basically solve all the autocorrelations.

43
Q

define ARMA process

A

ARMA process is a time series where the next value is determined based on a combination of a some past values and linear combination of past noise. It combines AR with MA.

We basically just add them together.
Since we add them together, ARMA will have a geometrically declining ACF. Therefore, the first step of identifying ARMA is probably to notice the declining ACF.

The same happens for the MA step and PACF.

ARMA has geometrically declining ACF and PACF. the effects are additive.

44
Q

elaborate on intuition behind ACF of MA process

A

We’re asking “is there a correlation between current y_t and y_{t-k}”. And for a true MA process of the k’th order, we know that y_{t-k} was determined based on whatever the noise at that time was. And since we also include this noise in the determination of hte current point, there is a correlation between them.

At the same time, this highlights why the ACF for moving averages suddenly drop. The true MA process simply doesnt rely on any common information when the time backwards is larger than the order.

45
Q

Define PACF

A

PACF is the correlation between lags after accounting for the effects that intermediary lags have.

46
Q

what can cause alternating ACF/PACF

A

negative values in the MA

47
Q

what do we mean by information criteria?

A

a two term thing. It will have some term related to the RSS, and one term related to punishing having more parameters.

When we add a parameter to a model, the RSS will likely drop (can never increase) but the penalty term will increase. These are competing effects that determine whether we consider it a good choice or not.

When we use informaiton criteria as the basis for selection, we want to minimize it.

48
Q

name the most popular information criteria, and give their forms

A

AIC, SBIC, HQIC.

They are based on the same things.
sigma^2 is the residual sum of squares, the RSS

AIC = ln(sigma^2) + 2k/T

SBIC = ln(sigma^2) + (k/T) lnT

HQIC = ln(sigma^2) + (2k/T)ln(ln(T))

49
Q

Recall the expression that any true linear time series process must follow.

Why does it not include any past values of the random variable we’re looking at?

A

r_t = mu + ∑w_i a_{t-i} [i=0, infinity]

It doesnt include any past variables r_{t-l} because these can be recursively expanded. The recursion ultimately bottoms out (in the infinite) and we will have only shocks and weights.

50
Q

why do we need the mean to be constant for stationarity?

A

If it is not constant, the values we use in the autocorrelation function will be differnet depending on where we are along the time series, which makes our predicitons bad.

51
Q

we know that white noise autocorrelation is typically assumed to be normally distributed. The sample distribution is normal with mean 0 and vairance 1/T. why do we give af?

A

We can use this to test whether a time series is white noise (just random mess) or not. If we have a large time series sample, the variance is extremely low. This means that if we use the t-ratio test to test whether the correlation for a certain lag is zero or not, we obtain really good values.

52
Q

why does the MA model definition not multiply a paramter theta with the a_t noise term?

A

because of how we defined linear models. That term would be index 0 in the sum which would cause theta to be power of 0, which gives 1. Helps to consider the AR collapsed version which creates powers of the weights.

53
Q

for MA models, what is key in computing ACF etc?

A

We need to remember that crossproducts are 0, because of how white noise series is iid.

This is in the context of taking the expected value of a cross product. Since independent normal variables allow us to do E[XY]=E[X]E[Y], we can make use of the fact that the expected value of each random variable in the white noise series is zero. This is why we can neglect the cross products.