Quant Methods Flashcards
Yi
Dependent and independent variables // Graph function
Yi = b0 + b1Xi + ei
- Dependent variable is Yi
- Independent variable is Xi
- Error term is εi
- Coefficients are b0 (intercept) and b1 (slope coefficent)
Scatter Plots types
Correlation coefficient (p or r) (Formula)
Correlation standardizes covariance by dividing it by the product of the standard deviations
Perfect postive correlation: +1
Perfect negative correlation: -1
No correlation: 0
Covariance (Formula)
A statistical measure of the degree to which two variables move together
(Sample) Standard Deviation Formula
Sx = [E (xi - xmean)2 / n-1] 1/2
Easier with calculator!!
Using calculator for Data Series to get Sx, Sy, r
- Add Data Series: [2nd] + [7]
- View Stats / Results: [2nd] + [8] > LIN [Down arrow]
Does not calculate Covariance!
BUT
Cov = rxy * Sx *Sy
Limitations of correlation analysis
- Correlation coefficient assumes linear relationship (no parabloic etc.)
- Presence of outliers can be distortive
- Spurious correlation (Fehlkorrelation)
- Correlation does not imply causation (Rain in NYC has no effect on LON Bus routes altough there might be a statistical correlation)
- Correlations without sound basis are suspect
Assumptions underlying simple linear regression
- Linear relationship – might need transformation to make linear
- Independent variable is not random – assume expected values of independent variable are correct
- Expected value of error term is zero
- Variance of error term is same across all observations (homoskedasticity)
- Error terms uncorrelated (no serial/auto correlation) across observations
- Error terms normally distributed
Standard error of the estimate (SEE)
Standard error of the distribution of the errors about the regression line
The smaller the SEE, the better the fit of the estimated regression line. Tigther the points to the line
k = # of independent variables (single regression: 1)
Sum of squared errors (SSE)
UNEXPLAINED: Actual (yi) - Prediction (^y)
The estimated regression equation will not predict the values y, it will only estimate them
A measure of this error is SSE (^y is predicted)
The coefficient of determination (R2)
Describes the percentage variation in the dependent variable explained by movements in the independent variable
Just r2 (loses + / -) add back when calculating r again
R<strong>2</strong> = 80% = 0.8 > r = 0.81/2 = 0.89 = -0.89 (see below)
y^ (predicted) = 0.4 - 0.3x > b1 = -0.3
Alternatively: R2 = RSS / TSS (if the same, R2 = 1 > perfect fit)
R2 = 1 - SSE/ TSS (if SSE = 0, R2 = 1 > perfect fit)
Total sum of the squares (TSS)
ACTUAL (yi) - MEAN
Alternatively, TSS = RSS +SSE
Regression sum of the squares (RSS)
EXPLAINED: PREDICTION (^y) - MEAN
Difference between the estimated values for y and the mean value of y
Graphic: Relationship between TSS, RSS and SSE
Relationship between TSS, RSS and SSE
- Using SSE, TSS and RSS to measure the goodness of fit of the estimated regression equation
- The estimated regression equation would be a perfect fit if every value of the dependent variable yi happened to lie on the estimated regression line. This would result in SSE=0 and RSS=TSS
- RSS/TSS is known as the coefficient of determination and is denoted by R2 :
Hypothesis testing on regression parameters
- Confidence Interval on b0 and b1
- For a 90% confidence interval, 10% significance, 5% (a/2) in each tail
- More HT in Multiple Regressions
ANOVA tables
- ANOVA stands for ANalysis Of VAriance
- It is a summary table produced by statistical software such as Excel
- Using the ANOVA table, calculate the coefficient of determination
- The global test for the significance of the slope coefficient
- Use of the F-statistic
Prediction intervals on the dependent variable
- Range of dependent variable (Y) values for a given value of the independent variable (X) and a given level of probability
- Two sources of error: Regression line and SEE
eg. 20 ——– 40
Limitations of regression analysis
- Parameter instability - Regression r_elationships can change over time_
- Public knowledge of relationships - If a number of analysts identify a regression relationship that works, prices will change to reflect the inflow of funds, possibly removing the trading opportunity
- Assumption violation - If regression assumptions are violated then hypothesis test and predictions will be invalid
Multiple Regression
Assumptions
- The relationship between the dependent variable and each independent variable is linear
- The independent variables are not random and there is no multicollinearity (x:x)
- The expected value of the error term is zero
- Error term is homoskedastic (E Variance constant; having the same scatter)
- No serial correlation
- Error term is normally distributed
ANOVA
Work out:
- Degrees of freedom (DF) with k = # variables ; n = sample size
- Sum of squares: 2 will be given (TSS = RSS + SSE)
Using the regression equation to estimate the value
Becomes: Ŷ = 0.163 - (0.28 x 11) + (1.15 x 18) + (0.09 x 215) = 37.13
But this is only an estimate, we will want to apply confidence intervals to this
Individual test: T-test
Testing the significance of each of the individual regression coefficients and the
intercept
Tcalc: bi / S.E.
Tcrit: 2 (given in CFA)
TCalc > TCrit (in absolut) = REJECT NULL (H0: b1 = 0)
then b1 not equal to 0 = SIGNIFICANT
Global F-Test: Testing the validity of the whole regression
Testing to see whether or not all of the regression coefficients as a group are insignificant
FCalc > FCrit (in absolut) = REJECT NULL: at least one does not equal zero