Quant Flashcards

Question 1

Q

What is linear regression?

Answer

A

Finding the relationship between 2 variables for predictive analysis

Question 2

Q

What is the SSE, SSR and SST

Answer

A

On a slope, one must determine the error between the line of best fit and the data points. These 3 varibles quantify that

SSR is the pRedicted deviation - its is the difference between the line of best fit and the mean of the data set

SSE is the ERROR deviation and is the difference between the line of best fit and the data point

SST is the sum of SSE and SSR - it shows the total deviation from the mean to the data point

Remember these are all SQUARED

Question 3

Q

What is the formula for r squared

Answer

A

R squared = SSR / SST

It shows how well explained/predictive the model is

Question 4

Q

Is a high or low r squared meaning the relationship is greater?

Answer

A

High r squared means HIGH relationship

Question 5

Q

R squred, what are the highest and lowest numbers it could be

Answer

A

It is between 0 and 1

Question 6

Q

What is the degrees of freedom

Answer

A

Degrees of freedom are the number of variables you have in the model minus how many variables you have minus 1.

You want the degrees of freedom high to have a good mdoel

DF = n-k-1

Question 7

Q

As Degrees of freedom increases, R squared ________? and why

Answer

A

As Degrees of freedom increases, R squared decreases.

Think if you only had 2 data points, the r^2 (relationship) would be 1. Putting in more variables would DECREASE r^2.

Question 8

Q

Formula for Y / relationship between x and Y

Answer

A

Y = β0 + β1 x + error

Question 9

Q

When comparing y = beta 0 + beta 1 * x + e, which is the independant and dependant variable?

Answer

A

Y is dependant, x is independant

Question 10

Q

If the confidence interval rises (from 90- 99%) does the probability of rejecting the null hypothesis go up or down? WHy

Answer

A

The probabily will go….. down. The confidence interval will get wider (to ensure we are more confient we have the right number).

Question 11

Q

What is a t statistic? What is the formula?

Answer

A

A t statisitc test is checking whether a hypothesised number could be the actual statistic/value of a score based on a t score, standard error, and the score we know to be true.

So it is the score we know it true + and - the t score * standard error.

The t score is found using the degrees of freedom minus 2. Get the score from the t table.

Question 12

Q

What is the stnadard error

Answer

A

SD / square root n

OR

Epsilon (which is Y -β1 - β2) < the formula for Y in reverse.

(Epsilon squared / n-2) ^.5

Question 13

Q

How to find SSR

Answer

A

It is the line of best fit - mean

Question 14

Q

How to find SSE

Answer

A

Value - line of best fit

Question 15

Q

What is an f test?

Answer

A

It compares 2 data sets to check if they’re statistically consistent

Question 16

Q

Confidence interval formula

Answer

A

= mean +- t or z score * standard error

Question 17

Q

What are the z scores for 90,95 and 99% ?

Question 18

Q

Coefficient of determination is

Question 19

Q

What is correlation squared?

Question 20

Q

in the formula y = Y = β0 + β1 x + error , What is B0

Answer

A

β0 is the y intercept

Question 21

Q

Confidence interval explanation and formula

Answer

A

Mean + - t or z stat * standard error.

Check if the OTHER mean (be it the actual or standard mean) is within those boundaries

Question 22

Q

What is the p value,

Answer

A

The pathetic value, we want that low to reject the null

Question 23

Q

What are some key assumptions to simple linear regression

Answer

A

the relationship between x and y is linear
x is uncorrelated with the error terms
Sum of residuals = 0
there is a constant variance

Question 24

Q

Formula for standard deviation with Standard error

Answer

A

Square root of Standard error / n-1

Question 25

Q

Is variance the same as SST?

Question 26

Q

Formula for DOF

Answer

A

DOF = k+ (n-k-1)

Question 27

Q

MSR (mean squared regression) and MSE (mean squared Error) formulas

Answer

A

MSR = SSR / k
MSE = SSE / n-k-1

Question 28

Q

What is MSR / MSE

Question 29

Q

Formula for standard error in regression

Answer

A

square root sse / n - k - 1

Question 30

Q

Correlation formula, then R squared formula

Answer

A

Cor = Cov / omega omega

R^2 = cor^2

Question 31

Q

F stat formula, what is means, and how to interperet it

Answer

A

F stat is testing if there is even a relationship between the y and x variables

It is MSR/MSE

Over 1 means that there is a relationship

Question 32

Q

Calcualte MSR and MSE

Answer

A

MSR = SSR / n-k-1
MSE = SSE/k

MSE/MSR = F

Question 33

Q

What does adjusted r^2 do

Answer

A

It adjusts the r^2 so that increasing the dof does NOT increase the r^2

Question 34

Q

Downfall of R^2?

Answer

A

It is not bound by 0 and 1

Question 35

Q

What is a dummy variable? And how to incorporate into formula?

Answer

A

Introducing a QUALatative variable. You give it a value of 1, and every alternative a value of 0. If it is months of the year, and you want only results collected in Jan, Jan has a value of 1, and the rest (minus one month) have a value of 0

Question 36

Q

What is heteroskatacity?

Answer

A

It is unequal variances. Pretty much that there is a relationship between the standard error and the variable’s variance. You don’t want that

Question 37

Q

What are the assumptions of multiple regression

Answer

A

There is a linear relationship
The independant variables are NOT random
Error = 0
Variance is constant
Errors are not correlated
Error is normally distributed

Question 38

Q

How does Heteroskatacity effect the standard error, and what does this mean?

Answer

A

It makes the standard error lower (because it can be more easily predicted etc. from variable value) meaning that it is HARDER to reject a null.

Question 39

Q

How to reject heteroskatcity?

Answer

A

Broysche Pagan Test

Question 40

Q

What is serial correlation?

Answer

A

Than an independant varialbe is correlated with itself, so it is more predictable and therefore variance is lowered. So if a stock goes up one day, it is more likley to go up the next day. That is not constant variance

Question 41

Q

What will serial correlation do to the t stat

Answer

A

Increase it meaning you wont be able to reject the null

Question 42

Q

What is multicollinarity

Answer

A

Multicolunarity means that two independent variables are closely correlated

Question 43

Q

What will multicolliarity do to the t stat and standard error

Answer

A

increase standard error and reduce t stat

Question 44

Q

How do you resolve multicolliarity?

Answer

A

Remove a variable

Question 45

Q

The null hypothesis is the….

Answer

A

not true hypothesis

Question 46

Q

What is autoregression?

Answer

A

Variable yesterday explains a variable today

Question 47

Q

Formula for an autoregression equation

Answer

A

x = b0 + b1(X-1n) +E

Question 48

Q

How do you detect if error terms are correlated?

Answer

A

Durbin Watson Test - you cant use this data if the error terms are correlated

Question 49

Q

I have x1, how do i get x2 using autoregression

Answer

A

x2 = b0+b1*X1

X1 is the same as x-1 from x2

Question 50

Q

Autoregressive correlation. How do you test for this, and what does the test mean?

Answer

A

Normal t test for this one. Find the autocorrelation / Standard error. Compare against t value.

If it is NOT REJECTED, the data is all okay

Question 51

Q

You do a t test on the serial correlation on some time series data and find out that the null is rejected, meaning that the t stat is outside the t value, what does this mean?

Answer

A

Rejected null means reject that data, it is autocorrelated and not good

Question 52

Q

Mean regression line, what is the formula for this?

Answer

A

B0 / 1-b1

THis is what the data points should revert to

Question 53

Q

How do you work out which autoregression line you should use? e.g. data from 2 years ago or 3 years ago.

Answer

A

You use Root Mean Squared Error. Pretty much the Square root of MSE of both series - the smallest means you use that data set

Question 54

Q

What is the mean reversion from a random walk and why

Answer

A

There is none! It is B0 / 1-b1

B1 is always 1, so 0/0 = 0

Question 55

Q

Formula for a random walk and what it means

Answer

A

x = x-1 + random error term.

It is the best guess of the value beyond that of the one in the past. x-1 + a random variable

Question 56

Q

Multicollinarity, Heteroskadacity and serial correlation, how are eachs’ standard error effected?

Answer

A

Multi = multicorrelation = Multiple increase in standard error, so Multicorrelation has a higher standard error, the other ones don’t have multi, meaning they have lower standard error

Question 57

Q

How can a model be misfitted?

Answer

A

Types:

Time-series: Serial correlation with a lagged variable, or forecasting the past
Functional: Omitting a variable or data pooled improperly

Question 58

Q

You use the Durbin Watson test to test for what?

Answer

A

Autocorrelation

Question 59

Q

When testing for Autocorrelation in Linear and Log Linear models, what do you use? And do you use something different for AR models?

Answer

A

Yes. Durbin watson for Linear and Log Linear.

T test for AR models

Question 60

Q

Important, what does covariance stationary mean. What are the assumptions.

Answer

A

Finite expected value
Constant Variance, Constant covariance
Has a mean reverting level
No root unit problem

Question 61

Q

Important, how do you make data covariance stationary

Answer

A

By First differencing data. You take the difference between a period and the period prior, that is now the new data point

Question 62

Q

What is first differencing data

Answer

A

Making data covariance stationary

Question 63

Q

What does the Durbin Watson test test for? ANd what is the magic number

Answer

A

Autocorrelation. It has like a permant t stat of 2. Less than 2 = NO serial correlation

Question 64

Q

What is the difference between an AR1 model and AR 2 model

Answer

A

AR1 only has 1 lagged variable, AR2 has 2.

Answer 60

A

The data is NOT covariance stationary because there is NO mean reverting level. You can not use the data.

Answer 61

A

Short term (yes short term). Why? Well, long term data may contain data points that have structural changes in the underlying economy or like data environment. Not good to model off

Answer 62

A

First difference the data. Period 1 - Period 1-1.

Answer 63

A

Making data convariance stationary by taking the difference between 2 data points

Answer 64

A

T test. If the autocorrelation t score is BELOW/within the critical t, autocorrelation is NOT present, so the data is good. If the data is correlated, use the next AR model (AR2, AR3 etc) til the serial correlation goes away

Answer 65

A

The unit root test (if present we are in the clear). Basically ensuring that b1 is not a 1 (meaning no mean reversion

Answer 66

A

Subtract x-1 from both sides of equation. It tests if the formula has a unit root, which is needed for an AR model to be covariance stationary.

Answer 67

A

Seasonaolity

Answer 68

A

Variance can be predicted

Answer 69

A

Finding patterns then applying those patterns.

Answer 70

A

A target is the y variable, the dependent variable, while the feature is the x variable

Answer 71

A

Training samples help a algorithm learn a pattern or relationship
Validation samples TUNE the model
Data or Test samples test the model on out of sample data

Answer 72

A

Undersupervised learning is when a Machine learning alogrithm learns the relationships between variables when they are not labelled. They find the patterns and relationships themselves

Answer 73

A

Supervised learning is when an analyst enters the labels of a dataset

Answer 74

A

It is when the analyst enters, it is something that contracins the learning progress of the model

Answer 75

A

Having too many features to describe a target. The model can NOT process or explain out of sample data.
Supervised only

Answer 76

A

Bias error means you have inputs that do not explain the changes in Y. This means the model is underfitted.

Variable error is when the model is overfitted. The model is great at explaining in sample data, but bad at out of sample

Answer 77

A

Holdout samples and K Fold cross variation

Answer 78

A

Penalised model (penalty for including increased variables)
Support Vector - classification model
K nearest neighbour - classification model - finding similarities in inputs
CART - Binary model -classification and regression tree
Emsemble/random forest - complex but low variation model

Answer 79

A

Principal Components - only showing the most relevant features
Clustering - K clustering - putting outputs into K clusters
Heirach Cluster - dividing clusters as they appear,

Answer 80

A

Super complex and very effective. good for nonlinear

Answer 81

A

Assume that they equal zero, so y = error

Answer 82

A

Market conversion price = Convertible bond price/Conversion ratio