Ch 3. Linear Regression Flashcards
Why is linear regression important to understand? Select all that apply:
- The linear model is often correct
- Linear regression is very extensible and can be used to capture nonlinear effects
- Simple methods can outperform more complex ones if the data are noisy
- Understanding simpler methods sheds light on more complex ones
- Linear regression is very extensible and can be used to capture nonlinear effects
- Simple methods can outperform more complex ones if the data are noisy
- Understanding simpler methods sheds light on more complex ones
Explanation:
The linear model (and every other model) is hardly ever true, but it is an important piece in many more complex methods.
You may want to reread the paragraph on confidence intervals on page 66 of the textbook before trying this queston (the distinctions are subtle).
Which of the following are true statements? Select all that apply:
- A 95% confidence interval is a random interval that contains the true parameter 95% of the time
- The true parameter is a random value that has 95% chance of falling in the 95% confidence interval
- I perform a linear regression and get a 95% confidence interval from 0.4 to 0.5. There is a 95% probability that the true parameter is between 0.4 and 0.5.
- The true parameter (unknown to me) is 0.5. If I sample data and construct a 95% confidence interval, the interval will contain 0.5 95% of the time.
- A 95% confidence interval is a random interval that contains the true parameter 95% of the time
- The true parameter (unknown to me) is 0.5. If I sample data and construct a 95% confidence interval, the interval will contain 0.5 95% of the time.
Explanation:
Confidence intervals are a “frequentist” concept: the interval, and not the true parameter, is considered random.
We run a linear regression and the slope estimate is 0.5 with estimated standard error of 0.2. What is the largest value of b for which we would NOT reject the null hypothesis that Beta1 = b ? (assume normal approximation to t-distribution, and that we are using the 5% significance level for a two-sided test; need two significant digits of accuracy)
0.892
Explanation:
The 95% confidence interval Beta1(hat) +- 1.96 SE( Beta1(hat) ) contains all parameter values that would not be rejected at a 5% significance level.
Which of the following indicates a fairly strong relationship between X and Y?
- R^2 = 0.9
- The p-value for the null hypothesis Beta1 = 0 is 0.0001
- The t-statistic for the null hypothesis Beta1 = 0 is 30
- R^2 = 0.9
Explanation:
R^2 is the correlation between the two variables and measures how closely they are associated. The p value and t statistic merely measure how strong is the evidence that there is a nonzero association. Even a weak effect can be extremely significant given enough data.
Suppose we are interested in learning about a relationship between X1 and Y, which we would ideally like to interpret as causal.
True or False? The estimate Beta1(hat) in a linear regression that controls for many variables (that is, a regression with many predictors in addition to X1) is usually a more reliable measure of a causal relationship than Beta1(hat) from a univariate regression on X1.
False.
Explanation:
Adding lots of extra predictors to the model can just as easily muddy the interpretation of Beta1(hat) as it can clarify it. One often reads in media reports of academic studies that “the investigators controlled for confounding variables,” but be skeptical!
Causal inference is a difficult and slippery topic, which cannot be answered with observational data alone without additional assumptions.
According to the balance vs ethnicity model, what is the predicted balance for an Asian in the data set? (within 0.01 accuracy)
[hint, see Ch. 3 slides handout p. 33]
$512.31
Explanation:
For an Asian, the predicted balance is the intercept plus the Asian ethnicity effect.
What is the predicted balance for an African American? (within .01 accuracy)
$531.00
Explanation:
For an African American, the predicted balance is just the intercept.
Note that despite the differing predictions, this difference is not statistically significant.