chapter 14: multiple regression and model building Flashcards
what are multiple regression models?
regression models that employ more than one independent variable
why is it possible for the multiple regression model formula to be like the:
y = B0 + B1x1 + B2x2 + E
why are there 2 xs?
because the mean level (Uy) now is B0 + B1x1 + B2x2
this basically means that there are two different independent variables that can correlate or “Influence” the dependent variable “y”
E still remains the error term that causes y to deviate from the mean level
what is the new name for the mean level:
Uy = B0 + B1x1 + B2x2
the plane of means
its in a three dimensional space
what is B0 in Uy = B0 + B1x1 + B2x2?
it is still the y intercept
what is B1 in Uy = B0 + B1x1 + B2x2?
the regression parameter for the variable x1
the slope of the plane of the x1 direction
what is B2 in Uy = B0 + B1x1 + B2x2?
the regression parameter for the variable x2
the slope of the plane of the x2 direction
what is the error term E in Uy = B0 + B1x1 + B2x2?
the error term
what describes the effects on y other than x1 and x2
what is the formula of the point estimate or prediction of
y = B0 + B1x1 + B2x2 + E
what is the name of such equation
y^ = b0 + b1x1 + b2x2
called the least squared plane, the estimate of the plane of means
what is there no error term when we use
y^ = b0 + b1x1 + b2x2
to predict a point of
y = B0 + B1x1 + B2x2 + E
the error term has a 50% chance of being positive and 50% chance of being negative
what is the residual?
the difference between the observes and predicted values
what is SSE
the unexplained variation
the sum of the squared residuals
what is the multiple coefficient of determination?
the proportion of the Total variation in the n observed values of the dependent variable that is explained by the overall regression model
R^2
R^2 = explained variation / total variation
what is the multiple correlation coefficient
R
what is the adjusted R^2
the adjusted multiple coefficient of determination used to avoid overestimating the importance of independent variables
adjusted R^2 =
(R^2 - (k / (n - 1))) * ((n - 1) / (n - (k + 1)))
n is the number of observations
k the number of independent variables in the model
what are the four assumptions of the error term values in the multiple regression model?
- at any given combination of x1, x2, …, xk, the population of potential error terms has a mean value of 0
- constant variance assumption
- normality assumption
- independence assumption
what is the error term constant variance assumption?
population of error term values has a variance that does not depend on the combination of values of x1, x2, …, xk
the different population of potential error terms corresponding to different combinations of values x1, x2, …, xk have equal variances
the constant variance is the population variance
what is the error term constant normality assumption?
at any given combination of x1, x2, …, xk, the population of potential error terms has a normal distribution
what is the error term constant independence assumption?
any one value of the error term E is statistically independent of any other value of E
an error term of a certain y has nothing to do with an error term of another y
what is the point estimate of the constant variance of the different populations of error terms?
formula too
the mean square error
s^2
s^2 = SSE / (n - (k + 1))
what is the point estimate of the standard deviation of the different populations of error terms?
formula too
the standard error
s
s = (SSE / (n - (k + 1)))^(1/2)
in the mean square error and the standard error (the point estimate of the constant variance and the standard deviation of the different populations of error terms),
what is the meaning of the following
n - (k + 1)
degrees of freedom associated with SSE
is testing the significance of the relationship between y and x1, x2, …, xk a proper way of assessing the utility of the regression model?
yeeee
how do you test the significance of the relationship between y and x1, x2
with the F test
what is the null hypothesis (H0) of the the significance of the relationship between y and x1, x2, …, xk
H0: B1 = B2 = … = Bk = 0
none of the independent variables x1, x2, … xk are significantly related to y
the regression relationship is not significant
what is the alternative hypothesis (H0) of the the significance of the relationship between y and x1, x2, …, xk
Ha: at least one of B1, B2 … Bk =/= 0
at least one of the independent variables x1, x2, … xk is significantly related to y
the regression relationship is significant
how do you calculate the F of the F statistic
F =
(explained variation) / k
____________________________
((unexplained variation) / (n - (k + 1)))
how do the R^2 and adjusted R^2 differ?
R̅^2 differs from R^2 by taking into consideration the number of independent variables in the model
Using R̅^2 helps avoid overestimating the importance of the independent variables
why would we test the significance of a single independent variable?
to gain further information of which independent variables significantly affect y?
when you test the significance of a single independent variable, how do you refer to it?
what else must you assume?
xj
you have to assume it is multiplied by the parameter Bj
what is the null hypothesis when you test xj?
Bj = 0
here, we say that xj is not significantly related to y
what is the alternate hypothesis when you test xj?
Bj =/= 0
here, we say that xj is significantly related to y in the regression model under consideration
what is the sbj
the standard error of the estimate bj
the point estimate of the population standard deviation of bj
what test do you use to test the significance of xj?
the t test
what is the formula of the t test to to test the significance of xj?
t = bj / sbj
using the t test to to test the significance of xj, when do we reject Ho in favor of Ha?
t > t alpha
p value < significance value