Lecture 17 Flashcards

1
Q

What is an autoregressive model

A

AR model uses past values of a variable to predict its current value
- e.g. AR(1): 1st order autoregressive model:
Yt = B0 + B1yt-1 + ut
- interpretation being, today’s value depends on yesterday’s value plus some randomness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to estimate AR models

A
  • using OLS, but OLS may be biased, especially in small samples, due to serial correlation and endogeneity between yt-1 and u
  • prove that AR models are biased, OLS underestimates the true autoregressive coefficient - known as Nickell bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Moving Average Models

A

MA(1): Yt = Q0 + ut + Q1ut-1, where ut is i.i.d
- current yt depends on current and past error terms
- MA models involve shocks
- key point: shocks have effects that propagate for a few periods, but then die out

MA(q): yt = Q0 + ut + Q1ut-1 + … + Qq.ut-q
- not easy to estimate with OLS as error terms are unobersvable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ARMA models, autoregressive moving average

A

Yt = u + SUM(Bt.yt-l) + ut + SUM(Qk.ut-k)
- captures persistence from past (AR) and shock driven dynamics (MA)
- estimated using MLE or specialised time series methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Distributed Lag models
- DL

A

Yt = a0 + a1xt-1 + a2xt-2 + … ak.xt-k + ut
- outcome yt depends on lags of another variable xt
- often used for policy analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Distributed Lag Models
- ADL

A

Autoregressive Distributed Lag:
- yt = u + SUM(Bt.yt-1) + SUM(aj.xt-j) + ut
- combines both lags of y and lags of x
- very flexible for modelling feedback and dynamic effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How many lags to include in time series models?
- too few lags and you miss dynamics, too many and you overfit
3 main approaches:

A
  1. Rules of thumb - based on data frequency, monthly data? Try 6 or 12 lags - half or full year
  2. Cross-Validation - more empirical: hold out parts of data, test predictive accuracy. Still not super common in time series due to serial dependence - often done with rolling windows
  3. Information criteria: model selection tools that balance fit and Parsimony
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Bayesian information criterion (BIC):

A

BIC(n) = ln (SSR(n)/T) + (n/T).(lnT)
- SSR(n): sum of squared residuals with n parameters
- first term: fit (lower is better)
- second term: penalty for complexity
Minimise BIC to find optimal number of lags, BIC tends to choose fewer lags

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

AIC - Akaike Information Criterion

A

AIC(n) = ln(SSR(n)/T) + 2n/T
- similar logic to BIC
- uses a smaller penalty -> tends to select more lags

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Violation of stationarity common in time series data - seasonality
- what’s the issue?
- why is this problem?

A
  • stationarity means the statistical properties of a series are constant without, seasonality violates this because certain repeat at regular intervals
  • if you ignore you might estimate spurious relationships - detecting false signals or incorrect causality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to fix violation of stationarity?
- Fixed data events
- seasonal patterns - trigonometric controls

A
  1. Fixed date events: use monthly dummies or specific event dummies, e.g. Christmas, bank holidays
    - these soak up predictable jumps due to calendar effects
  2. Seasonal patterns: use sin and cos terms to model smooth cyclical patterns, works well when seasonality is regular and continuous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a deterministic trend

A

A predictable, long-term movement in a time series that doesn’t arise from randomness
- deterministic trends break stationarity, as the mean of the series changes over time, violating the assumption that the data’s distribution stays constant
- can model the trend as having a systematic time component, e.g. yt = k1.t + ut

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Stochastic trends
- e.g. random walk

A

Random, trend changes unpredictably over time
- Yt = Yt-1 + ut, conditional mean is Yt-1 but variance grows with time, t.o^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Consequences of stochastic trends

A
  • using yt-1 as a regressor introduces endogeneity as the regressor and error term are correlated, so OLS underestimates true persistence and coefficient on yt-1 is biased downward
  • t stats are no longer valid, as if data has unit roots or non-stationarity, large sample properties break down
  • spurious regression, as if y and x follow stochastic, regressing on each other can give high R^2 and t stats even if unrelated.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Does heteroskedasticity matter?

A

Unconditional heteroskedasticity, i.e., the variance of ut changes over time regardless of xt can violate stationarity
- conditional heteroskedasticity, variance of ut changes depends on xt, does not affect OLS bias/ consistency, but affects inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Testing for stochastic trends: Dickey-Fuller test

A

Key issue is whether the process has a unit root, i.e. is non-stationary due to a stochastic trend
- test: H0: B1 = 1 vs H1: B1 < 1, in Yt = B0 + B1.Yt-1 + ut
- cant just use a normal t test under the null as when B1 = 1, series is non-stationary and the usual test stats dont follow standard distributions.

TRI(Yt) = B0 + w.Yt-1 + ut, test H0: w = 0 vs H1: w < 0

17
Q

Important notes for the Dickey-Fuller Test

A
  1. One sided test
  2. No need for robust SEs as special distributions already account for the non-standard inference
  3. Generalises to AR(p) models.
18
Q

Structural breaks

A

The regression relationship changes at some known point in time, t
- so have large ADL model with a potential break, i.e. interaction terms which are 1 if t exceeds the given time, and they have coefficients
- conduct an f test to see if regression coefficients are 0 or not, if at least 1 is non zero, then regression changes at t

19
Q

QLR test

A

In practise you wont know when the break happens
- so run chow test at every possible t in a central window
- take the largest F stat from those

20
Q

What does 15% trimming mean

A

Only testing breakpoints within the central 70% of the data

21
Q

Why would we need HAC standard errors in time series regressions

A
  • in time series, the error term ut might be heteroskedasticity and/or auto correlated
  • so regular SEs aren’t valid, we need HAC SEs to correct for both
  • maths PROVES OLS SEs are wrong in time series unless we account for them.
22
Q

Newey West SEs

A

Newey-West estimator adjusts SEs to account for autocorrelation and HT, so you don’t underestimate your SEs
- estimate the long-run, HAC variance
- bias-variance tradeoff, bigger m - captures more autocorrelation, but adds more noise (higher variance)
- smaller m is a cleaner estimate, but might miss serial dependence.

23
Q

Newey west component breakdown

A

Value is the long run variance estimator of estimator
- T is the number of time periods
- yj^ is the sample autocovariance of the regression residuals at lag j, y0^ is the variance, i.e. the autocovariance at lag 0
- m is the truncation parameter, how many lags back we are looking