Module 5 (Multiple Regression) Flashcards
A sporting goods store manager wants to forecast annual sneaker revenues based on the type of sport (running, tennis, or walking), color (red, blue, white, black, or violet) and its target audience (men or women).
How many independent variables should the manager include in her multiple regression analysis? Please enter your answer as an integer; that is with no decimal point.
7
An airport shuttle company forecasts the number of hours its drivers will work based on the distance to be driven (in miles) and the number of jobs (each job requires the pickup and drop-off of one set of passengers) using the following regression equation:
Travel time=-0.60+0.05(distance)+0.75(number of jobs)
On a given day, Victor and Sofia drive approximately the same distance but Sofia has two more jobs than Victor.
If Victor worked for 4 hours, for how long can the company expect Sofia to work? Please enter your answer rounded to one digit to the right of the decimal point. For example, if you think Sofia would work 236.7134 hours, enter 236.7.
5.5
The only difference between the workloads of the two drivers is the number of jobs each has; Sofia has two additional jobs. Therefore the company can expect Sofia to work the four hours Victor worked, plus an additional 0.75 hours for each of the two additional jobs, that is, 4+0.75(2)=5.5 hours.
True or false: There appears to be heteroskedasticity in this plot.
True. There appears to be heteroskedasticity because the residual plot for “Pieces of Mail Sent the Preceding Week” exhibits a funnel shape.
Which of the following statements about multicollinearity is TRUE? SELECT ALL THAT APPLY.
A - Multicollinearity is usually not an issue when the regression model is only being used to understand net relationships
B - Multicollinearity occurs when two or more independent variables are highly correlated
C - Multicollinearity is usually not an issue when the regression model is only being used for forecasting
D - Multicollinearity can typically be reduced by decreasing the sample size
E - Multicollinearity can typically be reduced by adding more independent variables
B - Multicollinearity occurs when two or more independent variables are highly correlated
Multicollinearity means that two or more of the independent variables are collinear, meaning they are highly correlated. One or more the independent variables may not be significant because the variable with which it is correlated serves as a proxy variable.
C - Multicollinearity is usually not an issue when the regression model is only being used for forecasting
Multicollinearity is typically not a problem when the model is being used for forecasting, especially if the predicative power of the model is increased by the additional variable(s).
A - On weekdays, the average number of chairs produced per shift is 70.97 greater than on weekends.
B - During the morning shift, the average number of chairs produced is 47.85 greater than during the evening shift.
C - On weekends, the average number of chairs produced per shift is 70.97 less than on weekdays, provided the shift time remains constant.
D - During the morning shift, the average number of chairs produced is 47.85 less than during the evening shift, provided that whether or not it is a weekday remains constant.
C - On weekends, the average number of chairs produced per shift is 70.97 less than on weekdays, provided the shift time remains constant.
70.97 is the net effect of the day being a weekday on the number of chairs produced per shift. On weekdays the Weekday variable is set to “1” in the regression. On weekends the Weekday variable is set to a “0” in the regression equation, so we essentially exclude this variable. Therefore, we can conclude that on weekdays, the average number of chairs produced is 70.97 greater than on weekends, provided that the shift time remains constant. Equivalently, we can conclude that on weekends, the average number of chairs produced is 70.97 less than on weekdays, provided that the shift time remains constant.
Which of the following is TRUE about the difference between analyzing residual plots for single variable regression models and analyzing residual plots for multiple regression models.
A - Multiple regression residual plots give insight into multicollinearity across the independent variables, whereas multicollinearity cannot occur in a single variable regression model or its residual plot.
B - Single variable regression plots give insight into the gross relationship between the independent and dependent variable, whereas multiple regression plots give insight into the net relationship, controlling for the other independent variables included in the regression model.
C - Multiple regression plots give insight into both heteroskedasticity and non-linearity, whereas single regression plots only give insight into heteroskedasticity.
D - In single regression plots, residuals are measured as the shortest distance from the regression line, whereas in multiple regression residuals are measured along the vertical axis.
B - Single variable regression plots give insight into the gross relationship between the independent and dependent variable, whereas multiple regression plots give insight into the net relationship, controlling for the other independent variables included in the regression model.
In multiple regression, we see the effect of each independent variable, controlling for all the other variables in the model, or the net effect. This is reflected in the residual plots.
True or false: Multicollinearity can be reduced by increasing the sample size.
True
True or false: Multicollinearity can be reduced by removing one or more of the collinear variables.
True