30: Regressions Flashcards
A researcher looked at the relationship of age as a predictor of memory, using a memory test scored 1 to 100. The analysis resulted in an equation of Y = 10 + 0.66X. A 72 year old individual scored 85 on the test. Which statement is not an accurate conclusion?
a. This equation would underestimate the test score.
b. The residual for this individual would be small.
c. The predicted memory score would 58.5.
d. The slope of the regression line would be .66.
b
Rationale: The regression line will minimize the residuals of all scores to best “fit” the data.
Calling a regression line the “line of best fit” indicates that it will:
a. Result in the smallest sum of residuals for the scores.
b. Fit a theoretical model.
c. Show a perfect correlation between X and Y.
d. Be significant.
ANS: A
Regression Line Rationale: The regression line will minimize the residuals of all scores to best “fit” the data.
The significance of a regression line does not refer to:
a. The slope of the line being significantly greater than zero.
b. The significance of the correlation coefficient associated with X and Y.
c. The significance of F in an analysis of variance.
d. The significance of the regression constant.
D
Rationale: Although a regression model will generate a p value for the constant, it is not a meaningful value. Significance refers to the regression coefficient (b). which is the slope of the line, indicating that it is significantly different from zero A slope of zero would indicate no association between X and Y.
In a regression analysis, the value of R2 tell us
a. The amount of variance in X that can be explained by Y.
b. The correlation of X and Y.
c. The size of the residual in predicting Y from X.
d. The significance of the correlation between X and Y.
A
Rationale: The coefficient of determination (R2) is the square of the correlation coefficient, and refers to the proportion of variance in X that can be explained by Y. The larger the value, the greater the residual, althoughTthEeSvTalBueAoNfKthSeErLesLidEuRal.cCanOnMot be estimated from R2 alone. R2 will have an associated p value that is the same as the significance for the correlation, but the value of R2 by itself cannot give this information.
In a multiple regression equation, several independent variables (X) are used to predict an outcome (Y). Researchers often want to determine which of the independent variables are the best predictors. Which of the following concerns needs to be considered when making that judgment?
a. The significance of the regression coefficient for each independent variable.
b. The collinearity between the independent variables.
c. The standardized beta weight for each independent variable.
d. The regression equation constant.
D
Rationale: The regression constant is not relevant. Each of the other choices will affect how each independent variable contributes to the prediction accuracy, and the degree to which the independent variables are related to each other (collinearity).
A researcher decides to use stepwise regression with backward selection to determine which variables should enter the equation. Which of the following statements is true?
a. The analysis will enter the X variable with the highest correlation with Y on the
first step.
b. The analysis will enter the X variable with the highest beta weight in the first step.
c. The analysis will discard the one X variable with the lowest correlation with Y on
the first step.
d. The analysis will result in no X variables being entered into the equation.
C
Rationale: The definition of backward selection is that the analysis starts with all variables and then removes the one with the smallest correlation with Y. The analysis stops when no further removals will improve the model.
The value of R2 in a multiple regression analysis is:
a. The average correlation between each independent variable and the dependent
variable.
b. A measure of collinearity.
c. The amount of variance in Y that is explained by the full set of independent
variables.
d. The amount of variance in Y that is explained by the X variable with the highest
correlation.
C
Rationale: R2 tell us the degree to which the full set of X scores helps explain Y.
When coding dichotomous independent variables for logistic regression, the typical format is:
a. 1 = the group with the more adverse condition; 2 = the group without the adverse
condition
b. 1 = the group with the more adverse condition; 0 = the group without the adverse
condition.
c. 1 = the group without the adverse condition; 0 = the group with the adverse
condition.
d. 1 = the group with the more desirable condition; 0 = the group with the less desirable condition
ANS: B
Rationale: The target condition (adverse) is given the value 1, indicating the presence of the adverse condition. The reference group (not adverse) is given the value 0, indicating the absence of the target condition.
Researchers studied the influence of having had a myocardial infarction (MI) and having
hypertension (HTN) as predictors of the occurrence of stroke. The reported results of logistic regression are shown in the table. Which of the following statements is an accurate interpretation of the data?
a. An individual who has HTN is about three times more likely to have a stroke as
someone who does not have HTN.
b. An individual who had an MI is about 2 times less likely to have a stroke as
someone who has not had an MI.
c. The odds ratio associated with MI is significant.
d. The odds ratio associated with HTN is not significant.
A
Rationale: The OR associated with HTN is significant (1.0 not included in CI). The OR and the adjusted OR are around 3.0. The OR for MI shows stroke is more likely, but it is not significant (CI contains 1.0).
The intent of analysis of covariance (ANCOVA) in the comparison of two groups
a. Correlate the dependent variable values for the two groups.
b. Decrease the difference between the two group means.
c. Adjust an outcome variable for differences in a covariate to make groups look
similar on the dependent variable.
d. To eliminate differences between groups on the dependent variable.
C
Rationale: The ANCOVA uses a covariate to adjust group means.
The square of the correlation coefficient (r2) is called the:
a. Coefficient of determination c. Partial correlation
b. Multiple correlation coefficient d. Regression coefficient
A
Rationale: This is the correct term for r2.
How much variance in Y has been explained by X if r = .8? a. 8% c. 64%
b. 16% d. 2%
C
Rationale: The explained variance is the square of the correlation.
All of the following refers to a difference between regression analysis and correlation except: a. Regression enables prediction of the dependent variable.
b. Regression allows for estimation of a line of best fit.
c. Regression provides an estimate of the amount of variance in Y explained by X.
d. Regression measures the strength of the association between two variables.
D
Rationale: Correlation measures the association between two variables. All other choices are unique to regression.
All of the following express a basic assumption for regression analysis except:
a. The regression line will minimize the sum of squared residuals.
b. The line of best fit will go through every point on the scatterplot.
c. The value of R2 indicates the proportion of variance in the dependent variable that
is explained by the independent variables.
d. A plot of standardized residuals that is spread around the zero point indicates that
normality assumptions are met.
B
Rationale: The line of best fit will not go through every point but will go through the set of points to minimize residuals.