Chapter 17 Multiple Regression Flashcards
Response surface
Graphical depiction of a regression model equation
Can only be drawn if number of independent variables are 2
Required conditions for error variable
1) probability distribution of e is normal
2) the mean of the distribution is 0
3) the standard deviation of e is sigma e, which is a constant
4) the errors are independent
Multiple regression equation
Calculated value of y = b0 + b1x1 +b2x2 +…+ bkxk
Where k is the number of independent variables
Some independent variables may be functions of others
Must determine if model fits to determine if analysis is worth continuing vs the model needs importing
Multiple regression step 1
Select the independent variables that you believe may be related to the dependant variable
Reasons not to include every available integral variable in a multiple regression model
- objective to determine if hypothesized model is valid (determine if there is a relationship between variables) so want to include only independent variables that may affect the dependant variable
- increasing number of independent variables increases the probability for type 1 errors (rejection of a true null hypothesis)
- Due to multicollinearity - possible to conclude that none of the independent variables are literally related to the dependant variable when 1 or more actually are related
Multiple regression step 2: excel calcuations
Input data into sheet so that independent variables are in adjacent columns
- data, data analysis , regression
- specify input y range (dependent) and x range (independent)
Coefficients listed give the b value for that x value
Multiple regression in excel
Input data into sheet so that independent variables are in adjacent columns
- data, data analysis , regression
- specify input y range (dependent) and x range (independent)
Coefficients listed give the b value for that x value
Multiple regression step 3: assess the model
Three assessments
- standard error of estimate
- coefficient of determination
- f-test of the analysis of variance
Standard error of estimate
- given by excel regression statistics data
Se= square root of (SSE/n-k-1)
n= sample size
k = number of independent variables
SSE =sum of squares for error = (n-1)*(sample variance of the dependant variable - (sample covariance ^2 / sample variance of the independent variable)
SSE also reported as standard error in excel regression analysis
Judged against values for the dependant variable (especially mean) for relative largeness or smallness
Coefficient of determination
R^2 = 1 - (SSE/ (sum of all values of y - mean of y)^2))
Or in excel regression analysis: R square value
Gives percentage of total variation in dependant variable explained by the independent variable
Adjusted R square
Coefficient of determination adjusted for degrees of freedom
Takes into account sample size and the number of independent variables (if number of independent variables is large relative to the sample size then unadjusted R2 may be unreasonably high)
Adjusted R square= 1- ((SSE/n-k-1)/((sum of all values y - mean y)^2/ (n-1))
Aka 1- (MSE/ standard deviation of y)
MSE : mean of the sum of squares for error
Testing validity of multiple regression model
H0= independent variables = 0 (means none of the independent variables are related to y, model is invalid)
H1: at least one b value (independent variable) does not equal 0
Total variation in y
Made up of SSR + SSE
SSR = variation explained by regression model SSE= unexplained variation
If SSR large relative to SSE model is relatively good
Mean square
Sum of squares / degrees of freedom
F statistic
Ratio of mean squares as long as underlying population is normally distributed
Ratio of two mean squares is f distributed as long as the underlying population is normal
Shown in the ANOVA table in excel (data regression analysis)
Large value f = model is valid (most of the variation in y is explained by the regression equation)
Small value of f= most of the variation is unexplained