Quantitative Methods Flashcards
Describe a simple linear regression model and the roles of the dependent and independent variables in the model.
Linear regression provides an estimate of the linear relationship between an independent variable (the explanatory variable) and a dependent variable (the predicted variable).
Describe the least squares criterion, how it is used to estimate regression coefficients, and their interpretation.
The estimated intercept, bo^, represents the value of the dependent variable at the point of intersection of the regression line and the axis of the dependent variable (usually the vertical axis).
The estimated slope coefficient, b1^, is interpreted as the change in the dependent variable for a one-unit change in the independent variable.
Explain the assumptions underlying the simple linear regression model, and describe how residuals and residual plots indicate if these assumptions may have been violated.
Assumptions made regarding simple linear regression include the following:
- A linear relationship exists between the dependent and the independent variable.
- The variance of the residual term is constant (homoskedasticity).
- The residual term is free from serial correlation.
- The residual term is normally distributed.
Residual Term = Error Term
Interpret the coefficient of determination in a simple linear regression.
The coefficient of determination, R2, is the proportion of the total variation of the dependent variable explained by the regression
R2 = RSS ÷ SST = (SST – SSE) ÷ SST
Interpret the F-statistic in a simple linear regression.
In simple linear regression, because there is only one independent variable (k = 1), the F-test tests the same null hypothesis as testing the statistical significance of b1 using the t-test:
H0: b1 = 0 versus
Ha: b1 ≠ 0.
With only one independent variable, F is calculated as:
F-Stat = MSR ÷ MSE with 1 and n − 2 degrees of freedom
What are the calcualtions used in the use of analysis of variance (ANOVA) in regression analysis.
RSS = (ŷ - ȳ)2
SSE = (yi - ŷ)2
RSS + SSE = SST
MSR = RSS ÷ k
MSE = SSE ÷ n – (K – 1)
RSS + SSE = SST
F Test = MSR ÷ MSE
SEE = √SSE ÷ (n - 2)
Formulate a null and an alternative hypothesis about a population value of a regression coefficient, and determine whether the null hypothesis is rejected at a given level of significance.
We can assess a regression model by testing whether the population value of a regression coefficient is equal to a specific hypothesized value.
A t-test with n − 2 degrees of freedom is used to conduct hypothesis tests of the estimated regression parameters:
t = (b1^ - b1) ÷ Sb1^
Calculate and interpret the predicted value for the dependent variable, and a prediction interval for it, given an estimated linear regression model and a value for the independent variable.
A predicted value of the dependent variable, ŷ, is determined by inserting the predicted value of the independent variable, Xp, in the regression equation and calculating ŷp = Bo^ + b1^Xp
The confidence interval for a predicted Y-value is [ŷp - (tc x Sf) < Y < ŷ + (tc x Sf]
where Sf is the standard error of the forecast.
What is a Cross-Sectional Regression?
Uses several observations of X and Y over a particular period of time. These observations could come from different companies, asset classes, investment funds, countries, etc.
What is Time-Series Regression?
Use many observations from different time periods for the same subject. For example, using monthly data from many years to test whether a country’s inflation rate determines its short-term interest rate
Formulate a multiple regression equation to describe the relation between a dependent variable and several independent variables, and determine the statistical significance of each independent variable.
The multiple regression equation specifies a dependent variable as a linear function of two or more independent variables:
Yi = b0 + b1X1i + b2X2i + … + bkXki + εi
The intercept term is the value of the dependent variable when the independent variables are equal to zero. Each slope coefficient is the estimated change in the dependent variable for a one-unit change in that independent variable, holding the other independent variables constant.
Interpret estimated regression coefficients and their p-values.
The p-value is the smallest level of significance for which the null hypothesis can be rejected.
- If the p-value is less than the significance level, the null hypothesis can be rejected.
- If the p-value is greater than the significance level, the null hypothesis cannot be rejected.
Formulate a null and an alternative hypothesis about the population value of a regression coefficient, calculate the value of the test statistic, and determine whether to reject the null hypothesis at a given level of significance.
A t-test is used for hypothesis testing of regression parameter estimates:
tbj = bˆj − bj ÷ (sbˆj)
with n − k − 1 degrees of freedom
Testing for statistical significance means testing H0: bj = 0 vs. Ha: bj ≠ 0.
Interpret the results of hypothesis tests of regression coefficients.
For a two-tailed test of a regression coefficient, if the t-statistic is between the upper and lower critical t-values, we cannot reject the null hypothesis. We cannot conclude that the regression coefficient is statistically significantly different from the null hypothesis value at the chosen significance level.
If the t-statistic is greater than the upper critical t-value or lower than the lower critical t-value, we can reject the null hypothesis and conclude that the regression coefficient is statistically significantly different from the null hypothesis value at the specified significance level.
Calculate and interpret a predicted value for the dependent variable, given an estimated regression model and assumed values for the independent variables.
The value of dependent variable Y is predicted as:
Y=b0+b1X1+b2X2+…+bkXk
Explain the assumptions of a multiple regression model.
Assumptions of multiple regression mostly pertain to the error term, εi.
- A linear relationship exists between the dependent and independent variables.
- The independent variables are not random, and there is no exact linear relation between any two or more independent variables.
- The expected value of the error term is zero.
- The variance of the error terms is constant.
- The error for one observation is not correlated with that of another observation.
- The error term is normally distributed.
Calculate and interpret the F-statistic, and describe how it is used in regression analysis.
The F-distributed test statistic can be used to test the significance of all (or any subset of) the independent variables (i.e., the overall fit of the model) using a one-tailed test:
F = MSR÷ MSE = RSS/k ÷ SSE/[n−k−1]
with k and n - k -1 degree of freedom
Contrast and interpret the R2 and adjusted R2 in multiple regression.
The coefficient of determination, R2, is the percentage of the variation in Y that is explained by the set of independent variables.
R2 increases as the number of independent variables increases—this can be a problem.
The adjusted R2 adjusts the R2 for the number of independent variables.
R2a=1 − [(n−1) ÷ (n−k−1) × (1−R2)]
Formulate and interpret a multiple regression, including qualitative independent variables.
Qualitative independent variables (dummy variables) capture the effect of a binary independent variable:
Slope coefficient is interpreted as the change in the dependent variable for the case when the dummy variable is one.
Use one less dummy variable than the number of categories.
Explain how Conditional Heteroskedasticity affects statistical inference.
Conditional Heteroskedasticity: Residual variance related to level of independent variables
The Effect: Coefficients are consistent. Standard errors are underestimated. Too many Type I errors.
Detection: Breusch-Pagan chi-square test = n × R2
Correction: Use White-corrected standard errors