4. Linear Regression Flashcards
Type II Error
Fail to reject a null that should be rejected; false negative
Explanation of Assumption 6
Assumption 6, that the error term is normally distributed, allows us to easily test a particular hypothesis about a linear regression model.
Effect on Size of Interval When Increasing Confidence
Confidence interval will expand
Explanation of Assumption 4
Assumption 4, that the variance of the error term is the same for all observations, is also known as the homoskedasticity assumption. The reading on multiple regression discusses how to test for and correct violations of this assumption.
Necessity of Assumption 2 and 3
Assumptions 2 and 3 ensure that linear regression produces the correct estimates of b0 and b1.
Hypothesis testing
A way for to test the results of a survey or experiment to see if you have meaningful results. Basically testing whether your results are valid by figuring out the odds that your results have happened by chance. If your results may have happened by chance, the experiment won’t be repeatable and so has little use.
Type I Error
Reject a null that should not be rejected; false positive
Dependent variable
The variable whose variation about its mean is to be explained by the regression; the left-hand-side variable in a regression equation.
Standard error of estimate/Standard error of the regression
Like the standard deviation for a single variable, except that it measures the standard deviation of the residual term in the regression.
Necessity of Assumption 1
Assumption 1 is critical for a valid linear regression. If the relationship between the independent and dependent variables is nonlinear in the parameters, then estimating that relation with a linear regression model will produce invalid results. For example, is nonlinear in b1, so we could not apply the linear regression model to it. Even if the dependent variable is nonlinear, linear regression can be used as long as the regression is linear in the parameters.
Even if the dependent variable is nonlinear, linear regression can be used as long as the regression is linear in the parameters.
Classic normal linear regression model assumptions
- The relationship between the dependent variable, Y, and the independent variable, X is linear in the parameters b0 and b1. This requirement means that b0 and b1 are raised to the first power only and that neither b0 nor b1 is multiplied or divided by another regression parameter (as in b0/ b1, for example). The requirement does not exclude X from being raised to a power other than 1.
- The independent variable, X, is not random.
- The expected value of the error term is 0: E( ε) = 0.
- The variance of the error term is the same for all observations: , i = 1, …, n.
- The error term, ε, is uncorrelated across observations. Consequently, E( εiεj) = 0 for all i not equal to j. 9
- The error term, ε, is normally distributed.
4 steps to determine the prediction interval for the prediction
- Make the prediction.
- Compute the variance of the prediction error using Equation 12.
- Choose a significance level, α, for the forecast. For example, the 0.05 level, given the degrees of freedom in the regression, determines the critical value for the forecast interval, tc.
- Compute the (1 − α) percent prediction interval for the prediction
Degrees of Freedom
The number of observations minus the number of parameters estimated.
Estimated variance of the prediction error depends on:
- the squared standard error of estimate, s^2
- the number of observations, n
- the value of the independent variable, X, used to predict the dependent variable
- the estimated mean
- the variance the independent variable.
Elements necessary to calculate test statistic for ANOVA
- the total number of observations (n)
- the total number of parameters to be estimated (in a one-independent-variable regression, this number is two: the intercept and the slope coefficient)
- the sum of squared errors or residuals (SSE, residual sum of squares)
- the regression sum of squares (RSS, total variation in Y explained in the regression equation)
95% Confidence Interval
The interval, based on the sample value (estimated), that we would expect to include the population (true) value with a 95% degree of confidence.
Limitations of Regression Analysis
- Regression relations can change over time, just as correlations can. This fact is known as the issue of parameter instability, and its existence should not be surprising as the economic, tax, regulatory, political, and institutional contexts in which financial markets operate change.
- A second limitation to the use of regression results specific to investment contexts is that public knowledge of regression relationships may negate their future usefulness.
- Finally, if the regression assumptions listed in Section 2.2 are violated, hypothesis tests and predictions based on linear regression will not be valid.
Independent variable
A variable used to explain the dependent variable in a regression; a right-hand-side variable in a regression equation.
P-Value
The p-value is used as an alternative to rejection points to provide the smallest level of significance at which the null hypothesis would be rejected. A smaller p-value means that there is stronger evidence in favor of the alternative hypothesis.
Coefficient of Determination
Fraction of the total variation that is explained by the regression.
(1-(unexplained variation/total variation))
Unbiased
Even though forecasts may be inaccurate, we hope at least that they are unbiased— that is, that the expected value of the forecast error is zero. An unbiased forecast can be expressed as E( Actual change – Predicted change) = 0. In fact, most evaluations of forecast accuracy test whether forecasts are unbiased.
Regression coefficients
The intercept and slope coefficient( s) of a regression.
Analysis of variance (ANOVA)
The analysis of the total variability of a dataset (such as observations on the dependent variable in a regression) into components representing different sources of variation; with reference to regression, ANOVA provides the inputs for an F-test of the significance of the regression as a whole.
Error term
The portion of the dependent variable that is not explained by the independent variable( s) in the regression.
Elements of Confidence Interval Hypothesis Test
1) the estimated parameter value
2) the hypothesized value of the parameter, b0 or b1
3) a confidence interval around the estimated parameter.
F -statistic
Measures how well the regression equation explains the variation in the dependent variable. The F-statistic is the ratio of the average regression sum of squares to the average sum of the squared errors.
Estimated parameters/Fitted parameters
With reference to a regression analysis, the estimated values of the population intercept and population slope coefficient( s) in a regression.
Confidence Interval
An interval of values that we believe includes the true parameter value, b1, with a given degree of confidence.
To compute a confidence interval, we must select the significance level fotr the test and know the standard error of the estimated coefficient.
Explanation of Assumption 5
Assumption 5, that the errors are uncorrelated across observations, is also necessary for correctly estimating the variances of the estimated parameters and . The reading on multiple regression discusses violations of this assumption.