Module 4 (Single Variable Linear Regression) Flashcards
The owner of an ice cream shop wants to determine whether there is a relationship between ice cream sales and temperature. The owner collects data on temperature and sales for a random sample of 30 days and runs a regression to determine if there is a relationship between temperature (in degrees) and ice cream sales. The p-value for the two-sided hypothesis test is 0.04.
How would you interpret the p-value?
A - If there is no relationship between temperature and sales, the chance of selecting a sample this extreme would be 4%.
B - If there is a relationship between temperature and sales, the chance of seeing a regression coefficient this large would be 4%.
C - There is a 4% chance that there is a relationship between temperature and revenue.
D - There is a 4% chance that there is no relationship between temperature and revenue.
A - If there is no relationship between temperature and sales, the chance of selecting a sample this extreme would be 4%.
Correct. The null hypothesis is that there is no relationship. The p-value indicates how likely we would be to select a sample this extreme if the null hypothesis is true.
Based on the residual plot, do you think that this regression model is a good fit?
A - Yes
B - No
B - No
The linear model does not appear to be a good fit because the residuals are not randomly distributed. The residuals form a funnel shape, which indicates that they are heteroskedastic.
How much variation in production volume can be explained by the number of factory workers?
A - 55.21%
B - 57.56%
C - 75.87%
D - 0.01%
B - 57.56%
The percent of variation in production volume that can be explained by the number of factory workers is represented by the R2 value. The R2 value is 57.56%.
What is the expected change in production volume, on average, as the number of factory workers decreases by five?
-8194.9
Since the slope represents the average change in production volume as the number of factory workers increases by one, the average change in production volume as the number of factory workers decreases by five is 1,638.98(-5)= -8,194.9.
Based on the regression model, the expected daily production volume with 112 factory workers is 118,846 units. The human resource department noted that 123,415 units were produced on the most recent day on which there were 112 factory workers. What is the residual of this data point? A - -4,569 units B - -2,163 units C - -41 units D - 41 units E - 2,163 units F - 4,569 units
F - 4,569 units
The residual is equal to the historically observed value minus the regression’s predicted value(ε=y-ŷ). 112 factory workers historically produced 123,415 units, whereas the regression model predicts that 112 workers would produce 118,846 units. The residual is the difference between these two values: 123,415 units – 118,846 units = 4,569 units.
If the expected production volume when there are 120 workers is approximately 131,958 units, which of the following equations would provide a reasonable estimate of the 68% prediction interval for the output of those 120 workers?
A - 120±14,994.93 B - 131,958±14,994.93 C - 131,958±29,989.56 D - 120±331.69 E - 131,958±331.69 F - 131,958±663.38
B - 131,958±14,994.93
A reasonable estimate of the prediction interval is the point forecast (131,958) plus or minus the z-value times the standard error of the regression (14,994.93). As usual, the z-value is based on the desired level of confidence. Since we want a 68% prediction interval, the z-value is equal to one. Therefore 131,958±14,994.93 is the best option.
A - -6.01 B - -2.45 C - 1.78 D - 2.45 E - The answer cannot be determined without further information
B - -2.45
Since the p-value, 0.3956, is greater than 0.05, the linear relationship is not significant at the 95% confidence level. Therefore, the 95% confidence interval of the slope must contain zero. The confidence interval is centered around the slope of 1.78, so the lower and upper bounds must be equally distant from the slope. The Upper 95% minus the slope is 6.01–1.78=4.23, so the Lower 95% is 1.78–4.23=-2.45.
A - -6.01 B - -2.45 C - 1.78 D - 2.45 E - The answer cannot be determined without further information
B - -2.45
Since the p-value, 0.3956, is greater than 0.05, the linear relationship is not significant at the 95% confidence level. Therefore, the 95% confidence interval of the slope must contain zero. The confidence interval is centered around the slope of 1.78, so the lower and upper bounds must be equally distant from the slope. The Upper 95% minus the slope is 6.01–1.78=4.23, so the Lower 95% is 1.78–4.23=-2.45.
The sports bar owner runs a regression to test whether there is a relationship between Red Sox away games and daily revenue. Which of the following statements about the regression output is true? SELECT ALL THAT APPLY.
A - The average daily revenue for days when the Red Sox do not play away is $1,768.32.
B - The average daily revenue for days when the Red Sox play away is $2,264.57.
C - The average daily revenue for days when the Red Sox play away is $1,768.32.
D - On average, the bar’s revenue is $496.25 higher on days when the Red Sox play away than on days when they do not.
E - The average daily revenue for days when the Red Sox do not play away is $1,272.07.
A - The average daily revenue for days when the Red Sox do not play away is $1,768.32.
B - The average daily revenue for days when the Red Sox play away is $2,264.57.
D - On average, the bar’s revenue is $496.25 higher on days when the Red Sox play away than on days when they do not.
Based on the regression output, what proportion of the variability in revenue can be accounted for by whether the Red Sox are playing away? Enter the value of the percentage with exactly ONE digit to the right of the decimal place. See the drop bar if you need more detail on how to round your answer.
The R Square value measures how much of the total variation in the dependent variable (in this case, revenue) that is explained by the independent variable (in this case, away game). As shown in the regression output, the R-square value is 0.2252, or approximately 22.5%
You must have followed the rounding instructions in the question and entered exactly 22.5 to be graded as correct.
Is the relationship between Red Sox away games and average daily revenues significant at the 95% confidence level? Choose the correct answer with the corresponding correct reasoning.
A - No, because R Square is less than 50%
B - Yes, because the slope is positive.
C - Yes, because the p-value of the independent variable is less than 0.05
D - Yes, because the p-value of the intercept is less than 0.05
E - Yes, because the p-value of the intercept is less than 0.95
F - Yes, because the p-value of the independent variable is less than 0.95
C - Yes, because the p-value of the independent variable is less than 0.05
Since the p-value, 0.0005, is less than 0.05, we can be confident that the relationship is significant at the 5% significance level and, equivalently, at the 95% confidence level.
The scientist performs additional analyses and observes that the number of major earthquakes does appear to be decreasing but wonders whether the relationship is statistically significant.
Based on the partial regression output below and a 5% significance level, is the year statistically significant in determining the number of earthquakes above magnitude 7.0?
Yes
Since the p-value is not provided, the confidence interval for the coefficient should be used. Since the 95% confidence interval, -0.11 and -0.04, does not contain zero, the coefficient for year is statistically significant.
What is the correlation coefficient of the relationship between the average weekly hours spent studying and the score on the final exam?
0.5049 is the Multiple R value. Remember that for single variable linear regression, Multiple R, which is the square root of R2, is equal to the absolute value of the correlation coefficient. The regression coefficient for Average Weekly Hours Studying (0.03, as shown in the bottom table of the output) is positive, so the slope is of the regression line is positive. Therefore, the correlation coefficient must also be positive.
- How much variation in production volume can be explained by the number of factory workers?
- What is the expected change in production volume, on average, as the number of factory workers decreases by five?
- 57.56%
The percent of variation in production volume that can be explained by the number of factory workers is represented by the R2 value. The R2 value is 57.56%. - -8194.9
Since the slope represents the average change in production volume as the number of factory workers increases by one, the average change in production volume as the number of factory workers decreases by five is 1,638.98(-5)= -8,194.9.
If the expected production volume when there are 120 workers is approximately 131,958 units, which of the following equations would provide a reasonable estimate of the 68% prediction interval for the output of those 120 workers?
131,958±14,994.93
A reasonable estimate of the prediction interval is the point forecast (131,958) plus or minus the z-value times the standard error of the regression (14,994.93). As usual, the z-value is based on the desired level of confidence. Since we want a 68% prediction interval, the z-value is equal to one. Therefore 131,958±14,994.93 is the best option.