4. Linear Regression Flashcards
Type II Error
Fail to reject a null that should be rejected; false negative
Explanation of Assumption 6
Assumption 6, that the error term is normally distributed, allows us to easily test a particular hypothesis about a linear regression model.
Effect on Size of Interval When Increasing Confidence
Confidence interval will expand
Explanation of Assumption 4
Assumption 4, that the variance of the error term is the same for all observations, is also known as the homoskedasticity assumption. The reading on multiple regression discusses how to test for and correct violations of this assumption.
Necessity of Assumption 2 and 3
Assumptions 2 and 3 ensure that linear regression produces the correct estimates of b0 and b1.
Hypothesis testing
A way for to test the results of a survey or experiment to see if you have meaningful results. Basically testing whether your results are valid by figuring out the odds that your results have happened by chance. If your results may have happened by chance, the experiment won’t be repeatable and so has little use.
Type I Error
Reject a null that should not be rejected; false positive
Dependent variable
The variable whose variation about its mean is to be explained by the regression; the left-hand-side variable in a regression equation.
Standard error of estimate/Standard error of the regression
Like the standard deviation for a single variable, except that it measures the standard deviation of the residual term in the regression.
Necessity of Assumption 1
Assumption 1 is critical for a valid linear regression. If the relationship between the independent and dependent variables is nonlinear in the parameters, then estimating that relation with a linear regression model will produce invalid results. For example, is nonlinear in b1, so we could not apply the linear regression model to it. Even if the dependent variable is nonlinear, linear regression can be used as long as the regression is linear in the parameters.
Even if the dependent variable is nonlinear, linear regression can be used as long as the regression is linear in the parameters.
Classic normal linear regression model assumptions
- The relationship between the dependent variable, Y, and the independent variable, X is linear in the parameters b0 and b1. This requirement means that b0 and b1 are raised to the first power only and that neither b0 nor b1 is multiplied or divided by another regression parameter (as in b0/ b1, for example). The requirement does not exclude X from being raised to a power other than 1.
- The independent variable, X, is not random.
- The expected value of the error term is 0: E( ε) = 0.
- The variance of the error term is the same for all observations: , i = 1, …, n.
- The error term, ε, is uncorrelated across observations. Consequently, E( εiεj) = 0 for all i not equal to j. 9
- The error term, ε, is normally distributed.
4 steps to determine the prediction interval for the prediction
- Make the prediction.
- Compute the variance of the prediction error using Equation 12.
- Choose a significance level, α, for the forecast. For example, the 0.05 level, given the degrees of freedom in the regression, determines the critical value for the forecast interval, tc.
- Compute the (1 − α) percent prediction interval for the prediction
Degrees of Freedom
The number of observations minus the number of parameters estimated.
Estimated variance of the prediction error depends on:
- the squared standard error of estimate, s^2
- the number of observations, n
- the value of the independent variable, X, used to predict the dependent variable
- the estimated mean
- the variance the independent variable.
Elements necessary to calculate test statistic for ANOVA
- the total number of observations (n)
- the total number of parameters to be estimated (in a one-independent-variable regression, this number is two: the intercept and the slope coefficient)
- the sum of squared errors or residuals (SSE, residual sum of squares)
- the regression sum of squares (RSS, total variation in Y explained in the regression equation)