MODULE 10 Flashcards

1
Q

simple linear regression / least squares regression

A

explain the variation in a dependent variable in terms of the variation in a single independent variable.
- y intercept and the slope coefficients and the error in the end (residual)
- residual = error
- the regression line minimizes the sum of squared residuals (e2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Regression line

A

one of many lines that can be drawn through the scatter plot. estimation of this line is the essence of lin regression. This line minimizes the sum of the squared differences between the predicted y and actual y. sometimes referred to as the ordinary last squares regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

sum of squared errors (SSE)

A

This is the UNEXPLAINED variation!! - var i dep not expl by ind. The model is not explaining this variation.

the difference between the point on the forecasted slope line and the actual value that exists at the point on the Y axis from that X value. We subtract these values and square them and sum them for all the observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the assumptions of a linear regression line and draw them out

A
  1. A linear relationship must exist between the dependent and independent variables.
  2. The residual term’s variance remains constant across all observations (homoskedasticity).
  3. The residual terms are independently distributed—meaning the residual of one observation does not correlate with another’s residual (in other words, the paired x and y observations are independent of each other).
    1. sometimes doesn’nt happen when there is seasonality in data (like people tend to shop way more during saturdays etc)
  4. The residual term follows a normal distribution. THIS CAN BE RELAXED IF THE SAMPLE SIZES ARE LARGE
  5. The dependent variable is uncorrelated with the residuals.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

SSR

A
  • EXPLAINED variation
  • subtract forecast value - mean value and square it and sum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

SST

A

SSE + SSR = SST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Mean Squared Error

A

unexplained / n-k-1 = SSE / n-1 cuz we only work with 1 var here
- VARIANCE around forecast value
- square root to get SE of estimate (kinda like SD)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do you do with this

A
  • calculate from ANOVA table - R^2 = correlation^2
  • R squared = explained variation / total variation = SSR / SST - what % of movement in dep is explained by indep
  • R squared = .0076 / (.0406 + .0076) = .1577
    • R2 high = good because things are more tight around our line
  • Standard error of the estimate (SEE) = SQRT (MSE)
    • LOWER THE SEE THE BETTER THE MODEL FIT
    • R2 goes up, SEE down
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the test for significance

A
  • F statistic - this is the global test
    • tests whether iten ind var explain variation in the dep (overall model significance)
      • H0 = all our coefficients are zero
      • Ha = at least one of our coefficients or slope is non zero
      • one tailed tests
      • Reject H0 = F statistic exceeds the critical value
      • df = 2
    • F stat calcu = MSR / MSE =
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to tetst individual slope coefficients

A
  • Regression coefficient t-Test
    • how many of the indep variables are significant actually?
    • if i fail to reject null = all are insignificant
    • reject null = at least 1 is significant - T TEST TO SEE WHICH ONE
  • Null = b1 = 0
  • alt = b1 ≠ 0
  • df = n-2
  • T stat = b1 - b1 hyp / SE
  • Can also test int this way
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

do the tests for ABC = .64, se = .26, n=36 sig = 5%

A
  • t = .64 - 0 / .26 = 2.46
  • critical stat = 2.03
  • 2.46 > 2.03
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how do you calcualte excess return according ot a regression model

A

the difference between the actual return and the return predicted by the regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
A

R stock = -2.3% + .64 (10%) = 4.1%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

why do we need to calculate the confidence interval

A

the int and the slope are sample values so they could give us errors. They’re not true population values
there are no error terms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

confidence interval information

A
  • We need to have a conf interval to solve this!
    • need to use standard error of forecast
    • Critical t is TWO TAILED with n-2 df
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
A

4.1% +- 2.03 (test stat) x 3.67(sf) = -3.4 , 11.6%

17
Q

types of lin reg relationships

A
  • log-lin: taking natural log of y only
  • lin-log: taking natural log of the x variable only
  • log-log: taking natural log of both x and y variables
18
Q

what does ANOVA mean?

A

How good is x at explaining y

19
Q

how do you calculate the slope in a regression line

A

Cov(xy) / SD^2. OR covariance/variance

20
Q

What is the R squared value

A

it’s the level of variation in Y explained. The formula is SSR / SST. What percent of the variation in Y is explained

21
Q

What is the SEE?

A

it’s the volatility of the residual or Square root of the MSE. This is kinda like the SD

22
Q

what happens in a dataset when the sample size is really large

A

for linear regression, the Sf can be approximated with SEE for large samples because of this equation.

23
Q

when the relationship between x and y is not linear, we can’t fit a linear model. explain what to do and the different situations and outcomes

A

log - lin - means that we take ln(y) for plain x. This means that the relative change in Y, absolute change in X. Forecast Y is e(ln)

lin-log - means that we take normal Y and ln(x). Absolute change in Y, relative change X

log-log - means that we take ln(y) and ln(x). Relative change in Y, relative change X. Forecast Y is e(lny)

24
Q

Give the intuitive definition of SSE, SSR, SEE, R2, SE, F test, t test

A

SSE: How much variation the model failed to explain.
SSR: How much variation the model successfully explained.
SEE: Average size of the prediction error.
𝑅2 : Proportion of variation explained by the model.
SE: Precision of the slope estimate.
F-Test: Whether the overall model is significant.
t-Test: Whether individual predictors are significant.