Module 5 – Multiple Regression Flashcards

Question

Dummy or Quantitative Variable? GENDER

Answer 1

DUMMY ## Footnote Time to run a marathon, height, size of flat–screen television, hours spent studying CORe, and calories in desserts are quantitative variables. Shoe color, number on an athlete’s jersey, gender, and ice cream flavor are categorical/qualitative variables and need to be transformed into dummy variables. Note that although athlete’s jerseys have numbers, those values cannot be interpreted as real numbers. For example, Eli Manning’s number is 10, whereas Peyton Manning’s was 18. However, you can’t interpret them to mean that Peyton is 80% more than Eli in some way.

Answer 2

DUMMY ## Footnote Time to run a marathon, height, size of flat–screen television, hours spent studying CORe, and calories in desserts are quantitative variables. Shoe color, number on an athlete’s jersey, gender, and ice cream flavor are categorical/qualitative variables and need to be transformed into dummy variables. Note that although athlete’s jerseys have numbers, those values cannot be interpreted as real numbers. For example, Eli Manning’s number is 10, whereas Peyton Manning’s was 18. However, you can’t interpret them to mean that Peyton is 80% more than Eli in some way.

Answer 3

QUANTITATIVE ## Footnote Time to run a marathon, height, size of flat–screen television, hours spent studying CORe, and calories in desserts are quantitative variables. Shoe color, number on an athlete’s jersey, gender, and ice cream flavor are categorical/qualitative variables and need to be transformed into dummy variables. Note that although athlete’s jerseys have numbers, those values cannot be interpreted as real numbers. For example, Eli Manning’s number is 10, whereas Peyton Manning’s was 18. However, you can’t interpret them to mean that Peyton is 80% more than Eli in some way.

Answer 4

QUANTITATIVE ## Footnote Time to run a marathon, height, size of flat–screen television, hours spent studying CORe, and calories in desserts are quantitative variables. Shoe color, number on an athlete’s jersey, gender, and ice cream flavor are categorical/qualitative variables and need to be transformed into dummy variables. Note that although athlete’s jerseys have numbers, those values cannot be interpreted as real numbers. For example, Eli Manning’s number is 10, whereas Peyton Manning’s was 18. However, you can’t interpret them to mean that Peyton is 80% more than Eli in some way.

Answer 5

QUANTITATIVE ## Footnote Time to run a marathon, height, size of flat–screen television, hours spent studying CORe, and calories in desserts are quantitative variables. Shoe color, number on an athlete’s jersey, gender, and ice cream flavor are categorical/qualitative variables and need to be transformed into dummy variables. Note that although athlete’s jerseys have numbers, those values cannot be interpreted as real numbers. For example, Eli Manning’s number is 10, whereas Peyton Manning’s was 18. However, you can’t interpret them to mean that Peyton is 80% more than Eli in some way.

Answer 6

QUANTITATIVE ## Footnote Time to run a marathon, height, size of flat–screen television, hours spent studying CORe, and calories in desserts are quantitative variables. Shoe color, number on an athlete’s jersey, gender, and ice cream flavor are categorical/qualitative variables and need to be transformed into dummy variables. Note that although athlete’s jerseys have numbers, those values cannot be interpreted as real numbers. For example, Eli Manning’s number is 10, whereas Peyton Manning’s was 18. However, you can’t interpret them to mean that Peyton is 80% more than Eli in some way.

Answer 7

QUANTITATIVE ## Footnote Time to run a marathon, height, size of flat–screen television, hours spent studying CORe, and calories in desserts are quantitative variables. Shoe color, number on an athlete’s jersey, gender, and ice cream flavor are categorical/qualitative variables and need to be transformed into dummy variables. Note that although athlete’s jerseys have numbers, those values cannot be interpreted as real numbers. For example, Eli Manning’s number is 10, whereas Peyton Manning’s was 18. However, you can’t interpret them to mean that Peyton is 80% more than Eli in some way.

Answer 8

7 Sales revenue is the dependent variable. Type of sport, color, and target audience are categorical variables which must be represented using dummy variables. Recall that it is necessary to use one fewer dummy variables than the number of options in a category. Thus, type of sport should be represented by 3-1=2 dummy variables, color should be represented by 5-1=4 dummy variables, and target audience should be represented by 2–1=1 dummy variables, for a total of 2+4+1=7 independent variables.

Answer 9

5.5 The only difference between the workloads of the two drivers is the number of jobs each has; Sofia has two additional jobs. Therefore the company can expect Sofia to work the four hours Victor worked, plus an additional 0.75 hours for each of the two additional jobs, that is, 4+0.75(2)=5.5 hours.

Answer 10

Intercept The intercept cannot be removed from the regression model unless there is good reason to believe that it is zero. **Number of Male Visitors** **The p-value of “Number of Male Visitors”, 0.2016, is greater than 0.1 so the organizer should consider removing this variables from the regression model.** Number of Female Visitors The p-value of “Number of Female Visitors”, 0.0018, is less than 0.1. Number of Retail Stands The p-value of “Number of Retail Stands”, 0.0000, is less than 0.1. Number of Food Stands The p-value of “Number of Food Stands”, 0.0390, is less than 0.1. **Number of Performances** **The p-value of “Number of Performances”, 0.5412, is greater than 0.1 so the organizer should consider removing this variable from the regression model.**

Answer 11

From the Data menu, select Data Analysis, then select Regression. The Input Y Range is F1:F40 and the Input X Range is B1:D40. You must check the Labels box to ensure that the regression output table is appropriately labeled. You must also check the Residuals and Residual Plots boxes so that you are able to analyze the residuals.

Answer 12

0. 9637 0. 9637 is the Multiple R value. 0. 9287 0. 9287 is the R2 value. **0.9225** **It is important to use the Adjusted R2 to compare two regression models that have a different number of independent variables. 0.9225 is the Adjusted R2 of the new model.** 0. 0025 0. 0025 is the p-value of the independent variable, "Number of Food Stands".

Answer 13

The expected daily revenue is B15+(1500\*B16)+(10\*B17)+(15\*B18)=$49,485. You must link directly to values in order to obtain the correct answer.

Answer 14

As the number of retail stands increases by one, the daily revenue increases by $2,298.36 on average. This option describes a gross relationship whereas the multiple regression model describes a net relationship. **As the number of retail stands increases by one, the daily revenue increases by $2,298.36 on average, provided that the number of female visitors and food stands remains constant.** **2,298.36 is the net effect of the number of retail stands on daily revenue. Thus, it describes the average increase in daily revenue as the number of retail stands increases by one while the other variables’ values remain constant.** As the number of female visitors increases by ten, the daily revenue increases by $29.34 on average. This option describes a gross relationship whereas the multiple regression model describes a net relationship. As the number of female visitors increases by ten, the daily revenue increases by $29.34 on average, provided that the number of retail stands and food stands remains constant. The net effect on daily revenue of increasing the number of female visitors by ten would be $293.40, not $29.34.

Answer 15

Intercept The intercept is not an independent variable. **Variable A** **The 95% confidence interval for the variable’s coefficient does not contain 0, which indicates that Variable A is significant at the 95% confidence level. The p-value (not shown) of Variable A, is 0.0001. Since it is less than 1–0.95=0.05, its value confirms that the variable is significant at the 95% confidence level.** Variable B The 95% confidence interval for the variable’s coefficient contains 0, which indicates that Variable B is not significant at the 95% confidence level. The p-value (not shown) of Variable B, is 0.0735. Since it is greater than 1–0.95=0.05, its value confirms that the variable is not significant at the 95% confidence level. Variable C The 95% confidence interval for the variable’s coefficient contains 0, which indicates that Variable C is not significant at the 95% confidence level. The p-value (not shown) of Variable C, is 0.3357. Since it is greater than 1–0.95=0.05, its value confirms that the variable is not significant at the 95% confidence level. **Variable D** **The 95% confidence interval for the variable’s coefficient does not contain 0, which indicates that Variable D is significant at the 95% confidence level. The p-value (not shown) of Variable D, is 0.0028. Since it is less than 1–0.95=0.05, its value confirms that the variable is significant at the 95% confidence level.** None of the variables are significant at the 95% confidence level Variables A and D are significant at the 95% confidence level.

Answer 16

To create the lagged variable, copy the data from C2:C32 and paste it into D3:D33. For example, 93,460 should be in cell D3; 85,220 should be in cell D4, and so on. Cell D2 should be blank and there is a new row of data that has an entry only in cell D33.

Answer 17

From the Data menu, select Data Analysis, then select Regression. The Input Y Range is B1:B31 and the Input X Range is C1:D31. You must check the Labels box to ensure that the regression output table is appropriately labeled. You must also check the Residuals and Residual Plots boxes so that you are able to analyze the residuals.

Answer 18

**Both variables are significant at the 5% significance level.** **“Pieces of Mails Sent the Preceding Week” is not significant because its p-value, 0.5446, is greater than 0.05.** There appears to be heteroskedasticity. There appears to be heteroskedasticity because the residual plot for “Pieces of Mail Sent the Preceding Week” exhibits a funnel shape. **On average, the net effect of both independent variables is negative.** **The coefficients for the independent variables are both positive so the net effect of these variables is not negative.** On average, the net effect of both independent variables is positive. The coefficients for the independent variables are both positive so the net effect of these variables is positive.

Answer 19

**Multicollinearity occurs when two or more independent variables are highly correlated** Multicollinearity means that two or more of the independent variables are collinear, meaning they are highly correlated. One or more the independent variables may not be significant because the variable with which it is correlated serves as a proxy variable. **Multicollinearity is usually not an issue when the regression model is only being used for forecasting** Multicollinearity is typically not a problem when the model is being used for forecasting, especially if the predicative power of the model is increased by the additional variable(s). Multicollinearity is usually not an issue when the regression model is only being used to understand net relationships Multicollinearity affects the estimates of the coefficients, thereby distorting the net relationships. Multicollinearity can typically be reduced by decreasing the sample size Multicollinearity can be reduced by increasing the sample size. Multicollinearity can typically be reduced by adding more independent variables Multicollinearity can be reduced by removing one or more of the collinear variables.

Answer 20

From the Data menu, select Data Analysis, then select Regression. The Input Y Range is A1:A62 and the Input X Range is B1:C62. You must check the Labels box to ensure that the regression output table is appropriately labeled. You must also check the Residuals and Residual Plots boxes so that you are able to analyze the residuals.

Answer 21

On weekdays, the average number of chairs produced per shift is 70.97 greater than on weekends. This option describes a gross effect whereas the multiple regression model describes a net relationship. **On weekends, the average number of chairs produced per shift is 70.97 less than on weekdays, provided the shift time remains constant.** 70.97 is the net effect of the day being a weekday on the number of chairs produced per shift. On weekdays the Weekday variable is set to “1” in the regression. On weekends the Weekday variable is set to a “0” in the regression equation, so we essentially exclude this variable. Therefore, we can conclude that on weekdays, the average number of chairs produced is 70.97 greater than on weekends, provided that the shift time remains constant. Equivalently, we can conclude that on weekends, the average number of chairs produced is 70.97 less than on weekdays, provided that the shift time remains constant. During the morning shift, the average number of chairs produced is 47.85 greater than during the evening shift. This option describes a gross effect but the multiple regression model describes a net relationship. During the morning shift, the average number of chairs produced is 47.85 less than during the evening shift, provided that whether or not it is a weekday remains constant. 47.85 is the net effect of a morning shift on the number of chairs produced, controlling for whether or not it is a weekday. Thus, when the morning shift occurs, the average number of chairs produced is 47.85 greater than when the evening shift occurs, provided that whether or not it is a weekday remains constant.

Answer 22

The expected number of chairs that will be produced is B15+(1\*B16)+(0\*B17)=B15+B16=477. You must link directly to values in order to obtain the correct answer.

Answer 23

The expected number of chairs that will be produced is B15+(0\*B16)+(1\*B17)=B15+B17=453. You must link directly to values in order to obtain the correct answer.

Answer 24

That we cannot reject the null hypothesis The significance level of the F-value provides information on whether we can reject the null hypothesis. The R-square value indicates what percentage of the variability in the dependent variable is explained by the regression line. That there is multicollinearity between the independent variables Multicollinearity involves correlation among individual independent variables; R-square provides information about the relationship between the regression line and the dependent variable. That on average, 0.7059 more chairs are produced during weekday shifts than during weekend shifts. The coefficient for shift-time, 70.97, tells us how many more chairs are produced on weekday shifts than on weekend shifts. **That 71% of the variability in the number of chairs produced can be explained by whether the shift is in the morning or evening and whether it is a weekday shift or weekend shift.** R-square indicates what percentage of the variability in the dependent variable is explained by the regression line

Answer 25

-14.52, -3.25 The p-value, 0.07, is greater than 0.05 so the independent variable is not significant at the 5% significance level. Therefore, the 95% confidence interval for the coefficient of the independent variable must include zero. The interval between -14.52 and -3.25 does not contain zero. **-14.52, 3.25** **The p-value, 0.07, is greater than 0.05 so the independent variable is not significant at the 5% significance level. Therefore, the 95% confidence interval for the coefficient of the independent variable must include zero. The interval between -14.52 and 3.25 contains zero.** 3.25, 14.52 The p-value, 0.07, is greater than 0.05 so the independent variable is not significant at the 5% significance level. Therefore, the 95% confidence interval for the coefficient of the independent variable must include zero. The interval between 3.25 and 14.52 does not contain zero. The answer cannot be determined without further information The p-value, 0.07, is greater than 0.05 so the independent variable is not significant at the 5% significance level. Therefore, we know that the 95% confidence interval for the coefficient of the independent variable must include zero.

Answer 26

Multiple regression residual plots give insight into multicollinearity across the independent variables, whereas multicollinearity cannot occur in a single variable regression model or its residual plot. Residual plots cannot provide information about multicollinearity. **Single variable regression plots give insight into the gross relationship between the independent and dependent variable, whereas multiple regression plots give insight into the net relationship, controlling for the other independent variables included in the regression model.** In multiple regression, we see the effect of each independent variable, controlling for all the other variables in the model, or the net effect. This is reflected in the residual plots. Multiple regression plots give insight into both heteroskedasticity and non-linearity, whereas single regression plots only give insight into heteroskedasticity. All residual plots give insight into BOTH heteroskedasticity and non-linearity. In single regression plots, residuals are measured as the shortest distance from the regression line, whereas in multiple regression residuals are measured along the vertical axis. Residuals are measured the same way in both single and multiple regression; they are measured as distance from the regression line along the vertical axis.

Module 5 – Multiple Regression Flashcards

(52 cards)