Regression Flashcards

1
Q

define regression analysis

A

regression analysis is the process of describing and evaluating the relationship between a given variable and multiple other variables.

Specifically, we always have one varaible, and we try to understand how this variable move as a result of movement in a set of other variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

elaborate on correlation

A

correlation is a measure of tendency to move together. It does not imply causality. Often there will be a third-party cause that has an effect on both variables, which makes it look like movement in one of them cause the other one to move as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

elaborate on how we treat the variabels in regression analysis

A

the dependent variable(s) is random variable subject to a probaiblity distribution.

The independent variables are assumed to have “fixed values in repeated samples”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

elaborate on the fixed regressor assumption

A

Same as “fixed in repeated samples”.

It is primarily a teaching thing.
It assumes that the values of the independent variables are the same all the time. Therefore, there is no uncertainty related to the sample process.

This means that the only uncertainty is related to the error term. The error term basically includes everything that we are not able to see, measure, and all that.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

elaborate on the random disturbance term

A

The idea is that it is basically impossible to represent a linear relationship by an exact linear line. it is very likely that there are variables we have not accounted for, or we have measurement error or whatever.

Since we are trying to model a relationship that is not perfectly linear, we cannot use a perfect line to do it.

Therefore, we add a random disturbance term, u_t, which is specific to an observation. Doing this allows us to represent a line using the exact curve a+bx while accounting for differneces between the line and the various points.

y_t = a + bx + u_t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how do we determine alpha and beta?

A

minimizing the vertical distances between each sample point and the exact line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

why minimize vertical distances? Why not horizontal? Why not perpendicular?

A

we are assuming fixed regressors. Therefore, the task becomes minimizing the vertical distances.

perpendicular comes into play later, when we include error in the sampling process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

elaborate on OLS

A

minimizing the sum of square errors. Squaring errors will penalize outliers harder.

y_t is the observed value.

y_t “hatt” is the prediction from the line.

The residual, û_t represent the error (y_t - ^y_t)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is RSS

A

Residual Sum of Squares

∑(y_t - ^y_t)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

elaborate on how we derive the OLS estimators

A

differentiate the expression with regards to the estimators:

∑(y_t - ^y_t)^2 = ∑(y_t -â - ^bx_t)^2

Recall why this works: Convex loss function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

elaborate on the reliability of the intercept

A

In many cases, all the observed datapoints are kind of far away from the intercept point. This means that we have no data close to the intercept, which would mean that we cannot treat such predictions.

The generalize to any area along the fitted line where we have missing data. We should be aware of the intervals of our data points, and thereby defining a sort of “operating range” of values that we should be fairly confident in.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

elaborate on the PRF

A

PRF = Population Regression Function

It is the model that we “consider” to be the true data generating solution.
y_t = a + bx_t + u_t

the PRF represent the true relationship between the independent variables and the dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

elaborate on the SRF

A

Sample regression function.

It is the estimated population regression function.

the SRF has no error term.

^y_t = â + ^bx_t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

elaborate on linearity we require

A

linearity in parameters (not necessarily variables).

This is the requirement to use OLS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

estimator vs estimtate

A

estimator is a function.

Estimate is an output from an estimator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

elaborate on CLRM

A

Classical Linear Regression Model.

CRLM is the classical line y_t = a + bx_t + u_t

y_t depends on u_t. therefore, we must specify some assumptions on how this random disturbance term is generated.

E(u_t) = 0

var(u_t) = sigma^2 < infinity (constant and less than infinity variance)

cov(u_i, u_j) = 0 (independent of each other)

cov(u_i, x_i) = 0 (independent error and variable)

u_t = N(0, sigma^2) (normally distributed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what happens if assumptions 1-4 hold?

A

BLUE

the estiamtors will have desireable properties:

1) unbiased
2) linear
3) estimators of the true value
4) estimators have the lowest variance amiong the class of linear unbiased estimators

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

how do we know that the estimators have the lowest variance in class?

A

Gauss-Markov theorem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

elaborate on consistency

A

consistency refer to an estimator approaching the true value when the number of samples grow large, but not necessarily for small sample sizes.

20
Q

what property do we say that consistency is?

A

An asymptotic property

21
Q

are all unbiased estimators consistent?

A

Not necessarily. If the variance increase as the sample size increase, it will not be consistent. The probability of observing large differences between estimated value and true value is still relatively large.

22
Q

elaborate a little on standard errors

A

Standard error is a precision measure that is used to indicate how reliable an estimate is.

It is found by taking the standard deviaiton of a statistic.

Note that the standard error doesnt tell us anything regarding the goodness of a specific estimate. It only tell us what we can expect in regards to an arbitrary estimate. A very low standard error means that we can consider the estimate to be generally of very good precision and so on.

Standard error is a function of x, sample variance, and the size of the sample

23
Q

give the matrix form multivariable linear regression

A

y = Xb + u

y and u are vectors of size Tx1.
X is a matrix of size TxK
b is a vector of size Kx1

24
Q

for the general linear regression model, how do we represent the standard errors of the parameters?

A

we use the formula:

25
Q

motivation of the F-test

A

We can test relationships that concerns more than 1 variable at a time. For instance, is it so that when we have both X_1 and X_2 together, that they can say something about the effect on Y, that they are unable to do independently?

26
Q

elaborate on the F-test

A

The idea is to compare the difference in variance of the unrestricted regression vs the restricted regression.

By taking RRSS - URSS and dividing on URSS, we receive 0, or close to 0, if the proposed restriction has a negligible effect on the result. If the numerator is very large, the restriction has a large effect, which basically means that the restriction made the model worse. This would simply mean that the restriction IS NOT BACKED by the data.

27
Q

how many restrictions do we say we have?

A

We look for the number of equalities (informally).

It is actually just the degrees of freedom related to the (RRSS-URSS) chi-squared variable.

28
Q

what is “THE” F-test in regression?

A

Testing for junk regressors by using the null hypothesis that all parameters are equal to 0, except for the first one that is the intercept.

The goal is to see if any of the parameters actually affect the dependent variable.

29
Q

elaborate on dummies and the dummy trap

A

dummies is basically binary variables used to encode categorical variables.

The important part is to make sure that we never enter the dummy variable trap. This trap is related to making the matrix non-solvable in regards to creating a unique solution. This happens if one of the columns (vectors) is a linear combination of some of hte others.

The intuition is that if we have say 3 dummies, and there is a constraint that one of them must be equal to 1, we get redundancy if we include all dummies and the intercept. Typically solved by removing one of hte dummies, and let the intercept hold the case where the third scenario applies.

Note that if there is no constraint on mutually exclsive events that must have 1 dummy be 1, the dummy trap no longer applies.

30
Q

elaborate on R^2

A

The square of the correlation coefficient between y and ^y.

Correlations are restricted to the interval [-1, 1], and thus the square is defined between [0,1]

A low correlation means a shit fit etc.

31
Q

what is the total sum of squares?

A

TSS = ∑(y_i - mean(y))^2

total sum of squares can be split into two things:
1) The part that is explained by the model
2) The part that the model was not able to explain

The part that the model is able to explain, is called explained sum of squares, ESS. The unexplainable part is the sum of squared residuals.

TSS = ESS + RSS

TSS = ∑(y_i - mean(y))^2 = ∑(y^_i - mean(y))^2 + ∑u_i^2

The explained sum of squares is the difference between each prediciton and the average.

32
Q

what is R^2 when we use TSS

A

R^2 = ESS / TSS

33
Q

motivation for NOT using R^2

A

1) it doesnt make sense to use it for comparisons because different dependent variables cause different values for the variables and y and R^2. So, it is invalid for comparison when the models use differnet dependent variables.

2) Adding more regressors never reduce the goodness of R^2, but can actually either make it remain the same or increase R^2, making it difficult to know if there are many useless regressors or not.

34
Q

What is type 1 error

A

The probability of rejecting the null hypothesis when it actually is true.

It will be equal to alpha, where alpha represent the level we set when conducting the test. We select for instance alpha=0.05, and say that if our observed value has a less than alpha probability of being observed given the distribution, we reject the null hypothesis because it was so uncertain.

35
Q

what is type 2 error

A

not rejecting the null hypothesis when it is false

36
Q

elaborate on the t-ratio

A

the t-ratio is a test where the nul lhypothesis parameter has the value 0. This means that we are basically testing whether our observed values indicate that the parameter is zero or not. How likely is it to observe our values given that the actual mean is 0.

This is used for testing whether a variable is having any effect at on on some other variable.

we collect a sample and use the estimator of linear regression parameters to create the estimate. Then we test this estimate against the hypothesis that its value should be 0.
I would assume that if the estimate is small, it is indication in itself that the null hypothesis is true, or cannot be rejected, but it is difficult to say this with exact probability. Therefore we check to see if the observed estimate is within the specific range of values which we consider to be expected. If the value is so large that it falls outside of the probability window we have sat, we should reject the null hypothesis and we can sort of conclude that the variable that corresponds to the parameter is of significance in predicting the dependent variable

37
Q

give the generalized linear regression line. Elaborate on it

A

y = Xb + u

y is a vector
X is a matrix
b is a vector
u is a vector

Given the ultimate b-vector that holds all of our estimator-results, we have a vector that will be mapped to a new vector, y.

38
Q

elaborate on “data mining” and why it is an issue and how to deal

A

They refer to data mining as the process of trying many variables in a regression without basing the selection on financial theory.

For instance, if we try 20 regressions, and the size of the test is 5%, and we find that 3 of the regressors are significant, what have we actually done?
The probability of observing an extreme value when doing 20 regressions with 5% is much higher than 5%, so the true size of the test is much larger.

The way we can deal with this problem is to use a separate test set of the data.

39
Q

why is the simple RSS not a good “goodness of fit” statisitc?

A

It doesnt provide any good measure because it is unbounded from above. How are we to interpret a value of 150.

The value of the RSS depends greatly on the scale of the dependent variable, which further provide fuckery.

40
Q

elaborate on goodness of fit statistics

A

The most common one is the R^2.

R^2 can be defined as the square of the correlation coefficient between y and y_pred.

Correlation lies between -1 and 1, and squaring it brings it to 0,1.

So, we need the correlation between each true value y and corresponding predicted value y_pred as provided by our model. But how do we get the correlation?

Another definition or R^2 is, is found by considering what the model is actually trying to explain. The model attempt to explain variability of y around its mean value mean(y).

41
Q

elaborate on the “alternative way” of defining R^2

A

Consider what the model attempts to do.

If we had no model, we would estimate y to be y_pred where y_pred is simply the mean of all values. This is no linear regression, but rather a simple average.
For stock returns, we’d look at a stock’s daily returns, take the average and say that the average is our prediction of the stock’s return.

With a model, a regression model in this case, we are adding explanatory variables with the goal of providing more direct relationships between certain factors and the dependent variable. This idea, or goal, of this is to increase our level of understanding, and increase what the model is actually able to explain. In a perfect model, the model would explain absolutely everything about the dependent variable. In such a case, variations in y would have direct relationships to certain variables in our model.
However, this is wishful thinking, as models usually dont come very close to perfect explanation.

but, in regards to defining R^2, which is a goodness of fit statistic, we can make use of the model’s “ability to explain movement in the dependent variable”.
The idea is to separate the movement in y that the model was able to indicate, from the movement in y that the model was not able to capture.
Since we are using simple linear regression for now, the “movement in y that the model was not able to capture” is simply the difference between y_true and y_pred, which we know as the residuals. On the other hand, the movement in y that the model was able to capture, can be defined as a “rest-sum” where we take the total sum of squares, and remove the residual sum of square (because this is not explained by the model) and we are then having the explained sum of squares.

The total sum of squares is defined as deviations around the mean of the dependent variable. Consider the stock’s daily returns again. We take the average daily return with a sample size of say 1 year, and use this average daily return as a baseline. Then we take each day’s return and compare against the average. The deviation is then squared, and summed over the sample. The result is the total sum of squares.

From intuition, we can also say that the explained sum of squares is the squared difference between the predictions and the mean.

So, now that we have some terms, we can define R^2 as a ratio tellign us “how much of the total movement was/is explained by the model”. to create this ratio, we take ESS and divide on TSS.

42
Q

what is required to create R^2 value = 0?

A

The enumerator must be 0, and the only way this is, is with all ESS contributions being equal to 0 (since they are restricted between 0 and positive). To create 0, the parameters of the model must all be equal to 0, and have the intercept be equal to the mean.

43
Q

elaborate on the problems with R^2

A

Cannot be used to compare different dependent variables

R^2 never decrease when adding more independent variables to the model.

44
Q

how can we make R^2 better

A

There is a way that accounts for the loss in degrees of freedom from incorporating more variables in the model.

This new metric is known as adjusted R^2.

The intuition behind the formula goes like this:
We use a factor (T-1)/(T-k) that is multiplied by (1-R^2), and then we take “1-result”. Adding more parameters makes it worse, so that the increase in R^2 from adding the params must be significalty large.