Quant Flashcards
Correlation & Regression
Terminology - Define the following:
- Coefficient
- Correlation coefficient
- Confidence interval
Coefficient - a numerical or constant quantity placed before and multiplying the variable in an algebraic expression (e.g., 4 in 4x y). It is usually a number, but may be any expression. In the latter case, the variables appearing in the coefficients are often called parameters, and must be clearly distinguished from the other variables.
Correlation coefficient, r, for a sample and ρ for a population, is a measure of the strength of the linear relationship (correlation) between two variables.
A confidence interval is an interval of values that we believe includes the true parameter value, b1, with a given degree of confidence. To compute a confidence interval, we must select the significance level for the test and know the standard error of the estimated coefficient.
Correlation & Regression
Everything in simple linear regression can be applied to multiple linear regression except the three below items. Define each of the unique items in simple linear regression:
- Correlation coefficient (+ apply to test statistic calc)
- Regression assumptions
- Forming prediction interval for dependent (Y) variable
- The correlation coefficient
t = r√(n-2)
√(1-r2)
- ANSWERS IS X CORRELATED TO Y?
- (“r” for a sample and “ρ” for a population) is a measure of the strength of the linear relationship (correlation) between two variables.
- +1 perfect positive to -1 perfect negative correlations
- The test statistic for the significance of a correlation coefficient (null is ρ = 0) has a t-distribution with n – 2 degrees of freedom and is calculated as:
- Regression Assumptions
- Linear relationship exists between the dependent and independent variables.
- Residual term:
- Independent variable is uncorrelated with the residual term.
- Expected value = zero
- Variance is constant
- Independently distributed; that is, the residual term for one observation is not correlated with that of another observation (a violation of this assumption is called autocorrelation).
- Normally distributed.
- <span>Note that five of the six assumptions are related to the residual term. The residual terms are independently (of each other and the independent variable), identically, and normally distributed with a zero mean.</span>
- Confidence Interval for a Predicted Y-Value (applies to multiple but DN2K for test)
- In simple linear regression, you have to know how to calculate a confidence interval for the predicted Y value:
- Confidence interval = predicted Y value ± (critical t-value)(standard error of forecast)
Correlation & Regression
R2a
(Adjusted R2)
Correlation & Regression
What is P-Value and how is it used in hypothesis testing?q
P-value is the smallest level of significance for which the null hypothesis can be rejected. An alternative method of doing hypothesis testing of the coefficients is to compare the p-value to the significance level:
- P-value < less than significance level, reject null
- P-value > greater than significance level then cannot rejcect null
- Remember: small Ps and big Ts to reject the null!
Correlation & Regression
Coefficient of Determination, R2
RSS
R2 = SST
Correlation & Regression
F-Statistic
- F-test assesses the effectiveness of the model as a whole in explaining the dependent variable.
- Assesses how well the set of independent variables, as a group, explains the variation in the dependent variable. That is, the F-statistic is used to test whether at least one of the independent variables explains a significant portion of the variation of the dependent variable.
Correlation & Regression
Multiple Regression flow chart of issues to know for exam.
- t-test assesses the statistical significance of the individual regression parameters,
- F-test assesses the effectiveness of the model as a whole in explaining the dependent variable.
- understand the effect that heteroskedasticity, serial correlation, and multicollinearity have on regression results.
Regression analysis
Define each of the following problems in regression analysis, the effects, how its identified and corrected.
- Conditional Heteroskedasticity
- Serial Correlation
- Multicollinearity
Regression
ANOVA table for multiple regression
Correlation and Regression
What are the six types of Model Misspecification and what’s their impact?