Session 3 - Quantitative Methods Flashcards
Covariance
Statistical measure of the degree to which 2 variables move together. Infinite range.
= [(X - mean of X)(Y - mean of Y)] / n - 1
Correlation Coefficient
A measure of the strength of the linear relationship (correlation) between 2 variables. Range of -1 to 1.
= (covariance of X and Y) / [(sample SD of X)(sample SD of Y)]
T-test
Used to determine if a correlation coefficient, r, is statistically significant.
= r√(n- 2) / √(1-r²)
Slope Coefficient
the change in the dependent variable for a 1 unit change in the independent variable.
= covariance / variance
Sum of Squared Errors (SSE)
The sum of the squared vertical distances between the estimated and actual Y-values.
Standard Error of Estimate (SEE)
Gauges the fit of the regression line. Smaller error = better fit.
= √(SSE/n-2)
Regression Sum of Squares (RSS)
Measures the variation in the dependent variable that is explained by the independent variable.
It is the sum of the squared vertical distances between the actual Y-values and the predicted Y-values.
Total Sum of Squares (SST)
Measures the total variation in the dependent variable. It is equal to the sum of the squared differences between the actual Y-values and the mean of Y.
Coefficient of Determination (R²)
The % of the total variation in the dependent variable explained by the independent variable.
= RSS/SST
= (SST - SSE)/SST
Total Variation (ANOVA)
= Explained variation (RSS) + Unexplained variation (SSE)
F-statistic
Assesses how well a set of independent variables, as a group, explains the variation in the dependent variable.
= (RSS/k) / (SSE/n-k-1)
where k = # of independent variables
P-value
The smallest level of significance for which the null hypothesis can be rejected.
An alternative method of doing hypothesis testing of the coefficients is to compare the p-value to the significance level.
If p-value is less than the significance level, the null hypothesis can be rejected.
Confidence Interval
Estimated regression coefficient +/- (critical t-value)(coefficient standard error)
Adjusted R²
Overcomes the problem of overestimating the impact of additional variables on the explanatory power of a regression model.
Dummy Variables
Independent variables that are binary in nature. They are often used to quantify the impact of qualitative events. They are assigned a value of 0 or 1.
Heteroskedasticity
Occurs when the variance of the residuals is not the same across all observations in the sample. This happens when there are subsamples that are more spread out than the rest of the sample.
Conditional Heteroskedasticity
Related to the level of independent variables. For example, it exists if the variance of the residual term increases as the value of the independent variable increases.
Unconditional Heteroskedasticity
Not related to the level of the independent variables, which means it doesn’t systematically increase or decrease w/ changes in the value of the independent variables.
How to detect Heteroskedasticity
- Examine a scatter plot of the residuals
- The Breusch-Pagan Chi-square test
* calculate robust standard errors to correct
Serial Correlation (auto-correlation)
Refers to the situation in which the residual terms are correlated with one another.
Positive Vs. Negative Serial Correlation
Positive: a positive regression error in one period increases the probability of observing a positive regression error in the next period.
Negative: positive increases the probability of negative regression error in the next period.
Multicollinearity
The inclusion of correlated independent variables
Computing a Test Statistic
= coefficient / std error
If this exceeds the critical t value, it is statistically significant.