Chapter 9 - Volatility and correlation Flashcards

1
Q

name 3 areas where linear models are unable to capture good representations

A

leptokurtosis

volatility clustering

leverage effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is leverage effect?

A

The tendency of volatility to increase more during downfalls of some magnitude vs the same magnitude in up-swings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how does Campbell define a non linear data generation process

A

current value of a tiem series is related non linearly to its current and previous error values:

y_t = f(u_t, u_{t-1}, u_{t-2}…)

where f is a non linear function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give the other definition given by Campbell regarding non linear data generation process

A

y_t = g(u_{t-1}, u_{t-2},…) + u_t sigma^2(u_{t-1}, u_{t-2},…)

So we have two functions: g and sigma^2 func. Both depend only on past errors. However the sigma function is multiplied by the current error.

We can talk about models being non linear in mean, or non linear in variance. This will depend on how g and sigma^2 look like. Remember: they are here functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

name the most popular non linear models used in finance, and why?

A

ARCH and GARCH models.

Others have simply not been found to be useful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

generally speaking, when should we consider a non linear model+

A

If the financial context indicate that there is a non linear relationship between the variables, then it is a natural thing to do.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what broad categorization of tests can be made regarding checking for non linearity?

A

general tests

specific tetss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

elaborate on general tests

A

Also called Portmanteau tests.

These are designed to test for many departures from randomness in data. These tests will likely detect a variety of non linear relationshps, but doesnt provide informaiton regarding which ones.
Ramsey’s RESET test is an example here.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is chaos theory

A

the theory that in the vast chaos of randomness in complex systems, there is a set of equations and laws that governs behavior. In other words, shit is determinisitic given a certain set of information.

Econometricians are looking for this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is SDIC

A

Sensitive dependence on initial conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

elaborate on sensitive dependence on intitial conditions

A

The general thing is that a small change in the initial conditions will carry a significant impact on the system. Grows exponentially through time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

in what context is SDIC used?

A

Do define a chaotic system.

A system is said to be chaotic if it exhibits sensitive dependence on initial conditions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a “true test for chaos”?

A

“The” true test for chaos is “The largest Lyapunov exponent”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

elaborate on the largest lyapunov exponent

A

It is a test for chaos that measure how fast, or measure the rate at which information is lost from a system.

A positive largest lyapunov exponent indicate sensitivty, and therefore that chaos is present.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

why do we bother with the largest Lyapunov exponent?

A

It test for the presence of chaos. This is useful because if a system has a lot of chaos, it will be difficult to perform long term forecasting. this is because all the informaiton we used (the initial conditions) are gone from the system essentially within a couple of time steps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

when are neural netowrks likely to work “best” in finance?

A

WHen the financial theory has very little to say about the nature of the relationship between a set of variables that we are looking at.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

neural nets have faded lately in finance. Why?

A

1) Next to impossible to explain any kind of interpretation of the model. We cannot use its nodes and variables to indicate a certain relationship.

2) it is difficult to perform testing on the model to see if it is adequate statistically.

3) There is a mismatch between in-the-sample and out-of-sample estimation. neural nets are more suited for interpolation rather than extrapolation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

volatility is often considered the most important metric or thing in the world of science. why?

A

It goes hand in hand with risk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

elaborate on historical volatility

A

It is found by taking the variance of returns, and sqrt’ing it.

it is not the best method, but it is a common benchmark.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

how do we find IV

A

we infer it using numerical approximations on the reverse of the black scholes model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

how can we represent EWMA?

A

sigma^2t = (1-lambda) ∑lambda^k (r{t-k} - mean(r))^2 [k=0, infinity]

lambda smaller than 1 create the diminishing effect further back.

The variance of some random variable t in the time series is given as a sum where each sum use the entire history of the random variables, relate them to the mean, and make them have smaller and smaller impact on the overall variance value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

wht is important to remember when using EMWA models

A

In practice, we cannot use the inifnite time series. therefore, we must decide on a spot to cut it / truncate the series. This means that the weights from the given expression will sum to less than 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

name some proxies for daily volaitlity estimate

A

1) Square the daily return

2) Range estimator. Typically involve taking the log of a ratio of daily high and daily low.

KEY: Remember that we are not looking for ways to define volatility here. What we do, is find variables that behave similarily to volatility, so that they become good predictors of actual volatility. This allows us to use them in an auto regressive model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

why do we use proxies for volatility?

A

True volatility is unobservable. we can never measure it correctly, so we use proxies.

Therefore, I suppose the important thing is that the proxies represent the violence of the market.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what is ARCH

A

AutoRegressiveConditionallyHeteroscedastic Models

26
Q

motivation for ARCH?

A

Volatility is clustering rather than constnat. As a result, it makes sense to use a model that can describe the heteroscedastic relationship.

27
Q

give ARCH(1) and elaborate on the reasoning behind it

A

sigma_t^2 = a + b u_{t-1}^2

So, we say that the volatilty, or variance, of time t, depends on the squared previous residual.

We are trying to model the variance at time t. This is important.

And we are modeling the variance at time t condiitonal upon the earlier residuals.
SO, we have:

sigma_t^2 = Var(u_t | u_{t-1}, …) = E[u_t^2 | u_{t-1}, …]
Why expected value of u_t squared? Because it is typically ASSUMED that the mean E[u_t] = 0, so the variance formula simply become the expected value of the squared residual.

And since this is what we are modeling as the dependent variable, u_t^2, it doesnt really make sense to include it as an independent variable. This is why ARCH(1) begin at u_{t-1} rather than t.

28
Q

in this chapter, when we use sigma_t^2, what are we referencing?

A

The conditional variance at time t (conditional on the earlier residuals).

29
Q

ARCH models are typically given as not only one model. elaborate

A

The ARCH itself requires the specification of the residuals. Therefore we must accompany the ARCH model by some model that we use to produce residuals.

30
Q

what is the goal of using an ARCH model? What are we trying to get out of it?

A

We are trying to predict the “conditional variance”.

Conditional variance is the variance at time t of the residual u_t given the earlier u’s. The reason why it is squared, is because the expectation of it is simply 0, so this part is removed. We are therefore still actually using the variance. but, we have a bunch of residuals, and we try to decide if the variance of these residuals can be predicted by auto regressive terms.

Why are we interested in the variance of u_t?

The key core concept is that we observe that stock prices move a lot during a small period of time, and then move very little for a period of time, repeat. In other words, we believe there is a tendency for the variance, or volatility, to be appearing in clusters. if we can figure out this pattern, we are closer to having a better idea of the risk of the market in the coming period.

As a result, we can consider ARCH models as trying to “exploit” some of the heteroscedasticity that actually limit us in making good models for regular time series forecasting.

Remember that the variance of the residuals capture the violence of the returns, and is therefore not related to the mean of the returns.

31
Q

elaborate on testing for ARCH effects

A

Run some regression to obtain residuals.

Square them reisduals, and run ARCH on them.

The ARCH model has a set of paramters. We are now going to test whether they are significantly differnet from 0. If they are all zero, besides from the constant coefficient, we sort of conclude that there was no heteroskedasticity present for this time. However, if some of them are statistically significant, it means that there is heteroskedasticity, and we reject the null hypothesis.

The test consists of getting the R^2 from the new regression using ARCH. WE then multiply it by T (sample size) which produce a chi squared variable with q degrees of freedom, where q is the number of lagged variables (of the residuals) we used. Then we make a simple chi squared test.

it is important to remember that this test is an all-or-nothing type of test that jointly test all the params.

32
Q

ARCH is a nice framework, but is rarely used. why

A

1) Difficult to determine the number of lags
2) easy to violate the non negativity constriants of the paramters

33
Q

elaborate on ARCH and non-negativity

A

The case is that if we have a negative coefficient that resutled from some fitted ARCH model, we can get negative estimates for the variance itself. This happens if the u_x happen to be extremely large for hte specific negative coefficient.

because of this, we must require that all coefficients of ARCH models are positive, or non-negative.

34
Q

ARCH sucks, what do we do?

A

Use GARCH instead. GARCH is actually used widely in practice.

35
Q

what does GARCH stand for

A

Generalized Auto Regressive Conditional Heteroscedasticity model

36
Q

elaborate on GARCH, and relate it to ARCH

A

It extends ARCH so that it also includes the lag of the conditional variance itself.

GARCH has a number of advantages compared to ARCH.
First of all, using recursion, a single GARCH-conditional-variance-lag is able to actually create the same effect as an ARCH model of infinite order. This is because of the way each lagged variable can be created from a lag etc.
This means that GARCH have a long memory effect. We could achieve a long memory effect from ARCH as well, but we’d get a rediculoius amount of variables and parameters we’d need to estimate, which is not feasible in practice. It is much better to keep it simple. Since the simple way also doesnt really add any complexity, it is obviously the preferred choice.

in terms of ACF, the ARCH models have a hard cutoff on the order. For instance, ARCH(1) only has correlation dependency to the first lag. Notihng with the second lag. This is not typically how the data generation process actually is, which cwould mean that it is a bad fit.
On the other side, GARCH have exponentially decaying ACF, which represents the long term memory it have. Basically, GARCH enable us to account for shit that happened a long time ago.

To sum up, a GARCH model is not that differnet from a ARCH(infinite) model, but it is vastly less complex and is much more robust. More variables and more parameters is typically not what we want.

37
Q

in practice, what GARCH model is typically used?

A

GARCH(1,1), with (2,2) being the upper bound. The reason for this is that (1,1) already has the long term memory relationship that is typically needed. Empirically, 1,1, is the way to go in most cases.

38
Q

it is often said that GARCH(1,1) has all the accuracy we need, and because of the recursive term we dont need any more orders. elaborate on this

A

The effect is recursively applied through the same parameters. This means that the assumption of a GARCH(1,1) model is that the effect of lag-2 is similar to lag-1, but diminished because it has been ran through the recursive loop one more step. This means that if we have some (weird) variance generation process that make say lag-4 be very important in determining the current level, then the model will not capture this well. But empirically, this is a stretch because of how volatility tend to cluster and most in a mean reverting pattern of arbitrary cycle lengths. but still, the GARCH(1,1) cannot capture such specifics.

SO the thing is that GARCH is typically not used with orders more than (1,1) because of how volatility behaves in practice.

39
Q

what can we say about conditional variance and unconditional variance of the residuals?

A

The conditional variance is obviously changing, and this is the entire premise for the volatility heteroskedastic models like ARCH and GARCH.

However, the unconditional variance

40
Q

elaborate on unconditional variance in regards to GARCH

A

Unconditional variance is the overall variance. We find it by taking the expected value of the conditional variance.

this means that we simply run the entire GARCH model through an expected value function to get the unconditional variance.

41
Q

elaborate on the error terms/residuals in GARCH

A

the main point is that GARCH assume a deterministic relationship between past variance, past shocks and the current variance. Whether we are able to capture this relationship or not, depends on our ability to find a suitable model.

There are no error terms in GARCH because it assumes a deterministoc relationship.

if we include residuals, we move into the world of stochastic volatility models.

42
Q

how do we typically estimate GARCH models

A

We cant use OLS, rthis is important.

We typically use maximum likelihood.

43
Q

first step in MLE

A

forming the likelihood function

44
Q

elaborate on forming the likelihood function

A

It is the joint probability distribution.

It is multiplicative, which is very difficult to differnetiate. Therefore we log it to make it additive instead.

we must find a random variable, find its distribution, and use the distribution to find the joint distribution function.

45
Q

we are we allowed to take the log of the likelihood function and get correct result after differnetiaten?

A

Because differentiating and solving for max value gives us the same value regardless of regualr or logged. This is becaue the value that maximize a log function will also maximize the other function.

46
Q

what are some serious drawbacks to MLE for GARCH?

A

Very common with local maxima

47
Q

recall the problems with GARCH

A

1) Possibility of negative coefficients. We’d have to enforce constraints on these to force them non negative.

2) GARCH doesnt account for any leverage effect

3) Doesnt allow any direct feedback between conditional variance and conditional mean

48
Q

elaborate on the leverage effect issue regarding GARCH

A

The case is that GARCH models doesnt differ between negative and positive variance motion. It sort of operate with “variance is variance” because of how it only use the squared residuals.

As a result, we end up with a case where the GARCH model will assign the same level of conditional variance to a specific positive shock and its equivalent negative shock. This might appear “good”, but in reality it is typically observed that negative shocks have a greater influence on volatility than positive shocks.

Why does this happen?
In the case of equities, if the stock price fall, the equity drop. But the debt stays the same. So the cash flows become more risky. As a result, the negative shocks have greater impact on the fear factor.

49
Q

Name some reasons for leverage effect

A

First is the increased debt to equity ratio already discussed.

Another reasonis the “volatility-feedback” hypothesis. The volatility feedback hypothessi is about this: If the expected returns increase when stock price volatility increases, the stock prices should fall when volatility rises.

50
Q

What can we do to account for assymmetries in GARCH?

A

We have 2 models:
1) GJR
2) EGARCH

51
Q

elaborate on GJR model

A

We extend it so that it also includes a term IF the residual is negative. This is done like this:

sigma_t^2 = a + b u_{t-1} + c sigma_{t-1}^2 + d u_{t-1}^2 I_{t-1}

The variable I_{t} is 1 if the residual u_t is less than 0. Otherwise it is 0.

if there is a leverage effect present, we’d have a d-constant larger than 0.

Recall that since the conditional variance must be non negative, we must also have that “b+d > 0”

52
Q

elaborate on EGARCH

A

EGARCH model conditiona lvariance using “ln(sigma_t^2)” etc. Makes sure there are no negative values, even if the parameters are negative.

53
Q

elaborate on the motivation on testing for assymetry in volatility models, and introduce the topic

A

The idea is that given a case, we want to see whether we need to use one of the models that handle asymmetry.

We have 2 main types of testing categories within asymmetry voaltilty testing:
1) sign tests
2) size bias test

The aim of this testing is to figure whether we need to use GJR or EGARCH, which handle volatility asymmetries, or if the regular GARCH model is adequate.

54
Q

what do we call the asymmetry tests (volatility?

A

Engle-Ng tests

55
Q

elaborate on the Engle-Ng sign bias test

56
Q

elaborate detailed on estimating parameters of GARCH models

A

The problem is that GARCH is not like regular regression which can be solved analytically with a nice expression.

we use MLE to find the parameters.

We have a series of observed values, which is the residual series that we obtain from the regular regression model.

MLE is based on us wanting to find the parameters that make it most likely to observe the observed values. The way this works is that we consider the residual, and we use the assumption that it is normally distributed with mean 0 and variance sigma_t^2. it is time dependent. Then we plug this into the expression for a normally distirbuted random variable, and find the joint distribution of all our points in the residual time series.

Now we have a joint distribution that indicate the probabilities of observing various values when the variables all have mean=0 but different variances. And these variances are of course given by our GARCH model. So instead of entering a number for the sigma_t^2, we enter the model expression. This allows us to figure out what values for the parameters that maximize the probability of observing the residuals that we did observe. Recall that in the normal distribution expression, we have “value less mean divided on sigma”. So we get simply “value less zero divided on sigma” which becomes “valiue divided on expression for sigma”.

Normally, we’d use the sigma^2 directly and solve for it. However, this operate under the assumption of constant variance, which is exactly what we are not doing here. Therefore, we use the expression for the sigma instead.

PRACTICAL: If everything is perfect, and simple, we solve analuytically. However, this usually never happens, and we need to solve numerically using gradient descent or some other optimzation technique. This require us to initially guess the parameter values and iteratively improve them.

57
Q

elaborate on the news-impact curve

A

A curve that represent the value of the conditional variance alogn the y-axis and the value of the lagged shock variable on the x-axis. So sigma_t^2 along y-axis, and u_{t-1} on the x-axis.

this obviously requires a fitted model. The parameters will determine how the news-impact curve look like.

The lagged conditional variance is set to unconditional variance.

Therefore, we basically have a fixed model, and we plot the values we get from sliding the u_{t-1} variable.

58
Q

elaborate on GARCH-in-mean

A

Based on the idea that more risk should be predicative of more return, we can create the GARCH-M model.

It is a nightmare to solve, and requires shit like MLE

59
Q

elaborate on GARCH in relation to volatility forecasting

A

It is possible to show that:

var(y_t | y_{t-1}, y_{t-2}, …) = var(u_t | u_{t-1}, u_{t-2}, …)

the significance of this is that by solving the GARCH, we also have the volatility estimates for the returns. Because of this, the primary use case of GARCH models is to find the variance of stock return series.

60
Q

what is the sort of “basis” for hypothesis testing under ML?

A

We use the LLF, and see if its value (its maximal value) drops when adding certain constraints. If it drops, then it is likely that those restrictions does not apply, and that the hypothesis should be rejected.

61
Q

there are 3 principles used for hypothesis testing based on maximum likelihood principles:

A

Wald, likelihood ratio, lagrange multiplier