Reading 7: introduction to linear regression Flashcards
Which of the following is least likely a necessary assumption of simple linear regression analysis?
The residuals are normally distributed.
There is a constant variance of the error term.
The dependent variable is uncorrelated with the residuals.
The model does not assume that the dependent variable is uncorrelated with the residuals. It does assume that the independent variable is uncorrelated with the residuals. (LOS 7.c)
What is the most appropriate interpretation of a slope coefficient estimate of 10.0?
The predicted value of the dependent variable when the independent variable is zero is 10.0.
For every one unit change in the independent variable, the model predicts that the dependent variable will change by 10 units.
For every 1-unit change in the independent variable, the model predicts that the dependent variable will change by 0.1 units.
The slope coefficient is best interpreted as the predicted change in the dependent variable for a 1-unit change in the independent variable. If the slope coefficient estimate is 10.0 and the independent variable changes by 1 unit, the dependent variable is expected to change by 10 units. The intercept term is best interpreted as the value of the dependent variable when the independent variable is equal to zero. (LOS 7.b)
Which of the following is closest to the value and reports the most likely interpretation of the R2 for this regression?
The R2 is 0.048, indicating that the variability of industry sales explains about 4.8% of the variability of company sales.
The R2 is 0.952, indicating that the variability of industry sales explains about 95.2% of the variability of company sales.
The R2 is 0.952, indicating that the variability of company sales explains about 95.2% of the variability of industry sales.
Intercept: -94.88(coefficient), 32.97(SE of Co)
Slope (industry sales): 0.2796(coefficient), 0.0363(SE of Co)
The R2 is computed as the correlation squared: (0.9757)2 = 0.952.
The interpretation of this R2 is that 95.2% of the variation in Company XYZ’s sales is explained by the variation in industry sales. The independent variable (industry sales) explains the variation in the dependent variable (company sales). This interpretation is based on the economic reasoning used in constructing the regression model. (LOS 7.d)
Based on the regression results, XYZ Company’s market share of any increase in industry sales is expected to be closest to:
4%.
28%.
45%.
Intercept: -94.88(coefficient), 32.97(SE of Co)
Slope (industry sales): 0.2796(coefficient), 0.0363(SE of Co)
The slope coefficient of 0.2796 indicates that a $1 million increase in industry sales will result in an increase in firm sales of approximately 28% of that amount ($279,600). (LOS 7.b)
The estimated increase in travel time for a motorcycle commuter planning to move 8 km farther from his workplace in London is closest to:
31 minutes.
15 minutes.
0.154 hours.
The slope coefficient is 1.93, indicating that each additional kilometer increases travel time by 1.93 minutes:
1.93 × 8 = 15.44
(LOS 7.b)
Based on the regression results, which model is more reliable? (London cars v motorcycles)
The passenger car model because 3.86 > 1.93.
The motorcycle model because 1.93 < 3.86.
The passenger car model because 0.758 > 0.676.
The higher R2 for the passenger car model indicates that regression results are more reliable. Distance is a better predictor of travel time for cars. Perhaps the aggressiveness of the driver is a bigger factor in travel time for motorcycles than it is for autos. (LOS 7.d)
Consider the following statement: In a simple linear regression, the appropriate degrees of freedom for the critical t-value used to calculate a confidence interval around both a parameter estimate and a predicted Y-value is the same as the number of observations minus two. The statement is:
justified.
not justified, because the appropriate of degrees of freedom used to calculate a confidence interval around a parameter estimate is the number of observations.
not justified, because the appropriate of degrees of freedom used to calculate a confidence interval around a predicted Y-value is the number of observations.
In simple linear regression, the appropriate degrees of freedom for both confidence intervals is the number of observations in the sample (n) minus two. (LOS 7.d)
What is the appropriate alternative hypothesis to test the statistical significance of the intercept term in the following regression?
Y = a1 + a2(X) + ε
HA: a1 ≠ 0.
HA: a1 > 0.
HA: a2 ≠ 0.
In this regression, a1 is the intercept term. To test the statistical significance means to test the null hypothesis that a1 is equal to zero, versus the alternative that a1 is not equal to zero. (LOS 7.d)
The variation in the dependent variable explained by the independent variable is measured by:
the mean squared error.
the sum of squared errors.
the regression sum of squares.
The regression sum of squares measures the amount of variation in the dependent variable explained by the independent variable (i.e., the explained variation). The sum of squared errors measures the variation in the dependent variable not explained by the independent variable. The mean squared error is equal to the sum of squared errors divided by its degrees of freedom. (Module 7.2, LOS 7.e)
The appropriate regression model for a linear relationship between the relative change in an independent variable and the absolute change in the dependent variable is a:
log-lin model.
lin-log model.
lin-lin model.
The appropriate model would be a lin-log model, in which the values of the dependent variable (Y) are regressed on the natural logarithms of the independent variable (X), Y = b0 + b1 ln X. (LOS 7.h)
For a regression model of Y = 5 + 3.5X, the analysis (based on a large data sample) provides the standard error of the forecast as 2.5 and the standard error of the slope coefficient as 0.8. A 90% confidence interval for the estimate of Y when the value of the independent variable is 10 is closest to:
35.1 to 44.9.
35.6 to 44.4.
35.9 to 44.1.
The estimate of Y, given X = 10 is: Y = 5 + 3.5(10) = 40. The critical value for a 90% confidence interval with a large sample size (z-statistic) is approximately 1.65. Given the standard error of the forecast of 2.5, the confidence interval for the estimated value of Y is 40 ± 1.65(2.5) = 35.875 to 44.125. (LOS 7.g)