MODULE 10 Flashcards
simple linear regression / least squares regression
explain the variation in a dependent variable in terms of the variation in a single independent variable.
- y intercept and the slope coefficients and the error in the end (residual)
- residual = error
- the regression line minimizes the sum of squared residuals (e2)
Regression line
one of many lines that can be drawn through the scatter plot. estimation of this line is the essence of lin regression. This line minimizes the sum of the squared differences between the predicted y and actual y. sometimes referred to as the ordinary last squares regression.
sum of squared errors (SSE)
This is the UNEXPLAINED variation!! - var i dep not expl by ind. The model is not explaining this variation.
the difference between the point on the forecasted slope line and the actual value that exists at the point on the Y axis from that X value. We subtract these values and square them and sum them for all the observations.
What are the assumptions of a linear regression line and draw them out
- A linear relationship must exist between the dependent and independent variables.
- The residual term’s variance remains constant across all observations (homoskedasticity).
- The residual terms are independently distributed—meaning the residual of one observation does not correlate with another’s residual (in other words, the paired x and y observations are independent of each other).
- sometimes doesn’nt happen when there is seasonality in data (like people tend to shop way more during saturdays etc)
- The residual term follows a normal distribution. THIS CAN BE RELAXED IF THE SAMPLE SIZES ARE LARGE
- The dependent variable is uncorrelated with the residuals.
SSR
- EXPLAINED variation
- subtract forecast value - mean value and square it and sum
SST
SSE + SSR = SST
Mean Squared Error
unexplained / n-k-1 = SSE / n-1 cuz we only work with 1 var here
- VARIANCE around forecast value
- square root to get SE of estimate (kinda like SD)
What do you do with this
- calculate from ANOVA table - R^2 = correlation^2
- R squared = explained variation / total variation = SSR / SST - what % of movement in dep is explained by indep
- R squared = .0076 / (.0406 + .0076) = .1577
- R2 high = good because things are more tight around our line
- Standard error of the estimate (SEE) = SQRT (MSE)
- LOWER THE SEE THE BETTER THE MODEL FIT
- R2 goes up, SEE down
what is the test for significance
- F statistic - this is the global test
- tests whether iten ind var explain variation in the dep (overall model significance)
- H0 = all our coefficients are zero
- Ha = at least one of our coefficients or slope is non zero
- one tailed tests
- Reject H0 = F statistic exceeds the critical value
- df = 2
- F stat calcu = MSR / MSE =
- tests whether iten ind var explain variation in the dep (overall model significance)
How to tetst individual slope coefficients
- Regression coefficient t-Test
- how many of the indep variables are significant actually?
- if i fail to reject null = all are insignificant
- reject null = at least 1 is significant - T TEST TO SEE WHICH ONE
- Null = b1 = 0
- alt = b1 ≠ 0
- df = n-2
- T stat = b1 - b1 hyp / SE
- Can also test int this way
do the tests for ABC = .64, se = .26, n=36 sig = 5%
- t = .64 - 0 / .26 = 2.46
- critical stat = 2.03
- 2.46 > 2.03
how do you calcualte excess return according ot a regression model
the difference between the actual return and the return predicted by the regression model.
R stock = -2.3% + .64 (10%) = 4.1%
why do we need to calculate the confidence interval
the int and the slope are sample values so they could give us errors. They’re not true population values
there are no error terms
confidence interval information
- We need to have a conf interval to solve this!
- need to use standard error of forecast
- Critical t is TWO TAILED with n-2 df