Correlation and Regression Flashcards
are the IV and DV for correlations and regression continuous, categorical, or both?
continuous
what does a correlation describe?
the relationship bw 2 variables via a correlation coefficient
t/f: a correlation measures the extent to which 2 variables are associated
true
what is the correlation coefficient?
describes the relationship bw 2 variables
what is the measure of if scores inc/dec together?
correlation
does correlation reflect sameness of scores?
no, only the nature of how they change
what are the 2 correlation coefficients?
Pearson’s and Spearman’s
what is the parametric correlation coefficient?
Pearson’s
what is the nonparametric correlation coefficient?
Spearman’s
what is the Pearson’s correlation coefficient?
the pattern of change/association of the 2 variables over different values of the variables
describes the strength and direction of the relationship
what letter represents the Pearson’s correlation coefficient?
r
what are the assumptions of Pearson’s correlation?
Linearity
Independence
Normality (30/more or SW)
Equality of variances
how do we determine normality for Pearson’s correlation?
if the n is equal to or greater than 30
if not run the SW test
if you have a Pearson’s correlation coefficient of r=-1, if the correlation perfect pos/neg?
perfect neg
if you have a Pearson’s correlation coefficient of r=1, is the correlation perfect pos/neg?
perfect pos
what is the range of Pearson’s correlation coefficient?
-1 to 1
if you have a pearson’s correlation coefficient of r=0, if this a perfect pos/neg correlation?
neither, it has no linear relationship
describe the Pearson’s correlation coefficient of r=.20
low positive
describe the Pearson’s correlation coefficient of r=.50
moderate positive
describe the Pearson’s correlation coefficient of r=.80
high positive
describe the Pearson’s correlation coefficient of r=-.8
high negative
what is the interpretation of r=0.9 to 1.0/-0.9 to -1.0?
very high correlation
what is the interpretation of r=0.7 to 0.89/-0.7 to -0.89?
high correlation
what is the interpretation of r=0.5 to 0.69/-0.5 to -0.69?
moderate correlation
what is the interpretation of r=0.26 to 0.49/-0.26 to -0.49?
low correlation
what is the interpretation of r=0 to 0.25/0 to -0.25?
very low correlation
what are the hypotheses for Pearson’s correlation coefficient?
H0: rho=0
Ha: rho is not equal to 0
what does the H0 of rho=0 mean?
x and y are not correlated
what does the Ha of rho not equal to 0 mean?
x and y are correlated
what is the test statistic of Pearson’s correlation coefficient?
r
what are the df of Pearson’s correlation coefficient?
n-2
are the hypotheses of Pearson’s correlation coefficient one or two sided?
two sided only
how do you report Pearson’s correlation coefficient results?
A Pearson correlation was conducted to determine the relationship bw [variables]. There was a statistically significant [strong/moderate/ weak, positive/negative] correlation bw [variable]. (Pearson r=[], p=[]).
what is the conclusion of the following data?
r=0.15
p<0.01
the correlation is statistically significant, but has low strength with limited clinical importance
when do we use the nonparametric correlation (Spearman’s)?
when the Pearson’s correlation assumptions fail
what are the hypotheses of the Spearman’s correlation coefficient?
H0: rho=0 (no correlation)
Ha: rho is not equal to 0
what is the procedure of the Spearman’s correlation coefficient?
1) separately rank the x and y data (Rx and Ry)
2) apply the Pearson product moment formula to the ranks
t/f: the interpretation of Spearman’s correlation is the same as Pearson’s
true
t/f: the Spearman’s “perfect” correlation looks different than Pearon’s since linearity is not necessary
true
how do we report the results of the Spearman’s correlation coefficient?
A Spearman correlation was conducted to determine the relationship bw [variables]. There was a statistically significant [strong/moderate/ weak, positive/negative] correlation bw [variable] (Spearman rho=[], p=[]).
what does the Spearman’s correlation evaluate?
the relative ording of x and y rather than actual distances from the mean
non linear trends can be better detected by Pearson’s or Spearman’s correlation?
Spearman’s correlation
what correlation coefficient is better when the assumptions of Pearson’s fail?
Spearman’s correlation
t/f: Spearman’s correlation is slightly worse than Pearson’s when assumptions (LINE) are met
true
is there more type 1 or type 2 errors when we use Spearman’s when the assumptions are met?
type 1 errors
t/f: correlation is causation
FALSE
what is the intraclass correlation coefficient (ICC)?
the assessment of the reliability of measurement scales (test-retest, intrarater, interrater, etc)
the association of 2 or more measures and the amount of association
ICC reflects both the _____ __ _____ and ______ bw measurements
degree of correlation, agreement
what is the reliability index?
the true variance divided by the true variance plus the error variance
what are the ICC assumptions?
normality and stable variance
if there is a true variance of 9.6 and error variance of 12.8, what is the ICC?
9.6/9.6+12.8=0.43
what is the range for ICC?
0-1
t/f: there are no negative values for ICC
true
what does ICC<0.5 mean?
poor reliability
what does ICC 0.5-0.75 mean?
moderate reliability
what does ICC 0.75-0.9 mean?
good reliability
what does ICC >0.9 mean?
excellent reliability
when do we use linear regression?
when we want to predict DV using IVs
what is an example of linear regression testing?
researchers want to know how weekly energy expenditure impacts LDL cholesterol levels in ppl who are in their 40s
the blood samples were collected from the antecubital vein bw 8-10am, in a sitting position after 12 hours of fasting and avoiding alcohol and the LDL cholesterol was measured (in mg/dL)
researchers also estimated an index of weekly energy expenditure by examining the type, frequency, duration, and intensity of sports-related physical activity in the past year. The index ranges from 0-50.
what is a linear relationship?
when 2 variables are related linearly, the relationship can be summarized by a single, straight line
simple linear regression is also called what?
bivariate regression
what is the simple linear regression?
the process of identifying a line that best characterizes a linear relationship bw 2 continuous variables
what variable is the predictor in regression?
IV
is the predictor/IV x or y?
x
what variable is the outcome?
DV
is the outcome/DV x or y?
y
when there is 1 IV and 1DV, what is run?
simple linear regression
every line is characterized by what two things?
a slope (a) and an intercept (b)
in the equation y = a(x) + b
what is the equation of a line?
y = a(x) + b
in regression, if x=1, a=4, b=2, what is the y? (bad question, ignore)
4
linear regression requires what two things?
intercept and slope
what is the procedure for least squares criterion?
1) draw a line through the data
2) calculate the distance of every observation to the line, square it, and add up over all observations
t/f: the line with the smallest SS is the best regression line
true
based on the SS, what is the best regression line?
the line with the smallest SS
t/f: there is only 1 line the is the best regression line
true
what is the linear regression equation of point estimation?
y = ax + b + epsilon
in the equation for point estimation, what does each letter mean?
y=value of DV
x=value of IV
a=slope
b=intercept
epsilon=error term
when x increases by 1 unit, y increases by ___ units
a (slope)
what is the linear regression equation?
mu of y = ax + b
the linear regression equation predicts the mean of _ based on _
y, x
what is the interpretation of the following: the linear regression equation predicts the mean of y based on x?
the mean of y for a given value of x is obtained from the linear equation ax + b
is there an error term in the linear regression equation?
nope
what is the primary research question in evaluating linear regression?
are x and y related?
the linear regression estimates and tests what?
the x-y relationship
the linear regression predicts _ using _
y, x
the x-y relationship is described by what?
the slope (a)
what are the hypotheses of the simple linear regression?
H0: a=0
Ha: a is not equal to 0
are the hypotheses for simple linear regression about the slope (a) or the intercept (b)?
the slope (a)
what are the assumptions of the linear regression?
Linearity
Independence
Normality of residuals
Equality
is there a nonparametric test for linear regression if the assumption of normality of residuals is violated?
nope
what is an unspoken assumption of regression?
that you have to know which variable is IV and which is DV bc regression requires a clear distinction bw them
if it is unclear which variable is the DV or IV, what testing should be done?
correlation
t/f: it is possible to reject H0 w/a nonlinear trend in regression
true
what df do we use from the regression F test?
regression(1) and residual (n-2)
if the p value of the F test are less than or equal to alpha, what does this mean?
x and y are linearly related (reject H0)
if the p value of the F test is greater than alpha, what does this mean?
x and y are not linearly related
where do we get the significance level for regression?
the ANOVA table significance
where do we get the y intercept and slope for the regression equation?
from the coefficients table
the constant unstandarized B is the y intercept
the variable unstandardized B is the slope
if the variable unstandardized B value is -2.483, what does this mean?
for one unit increase in x, there is a 2.48 decrease in y
t/f: the inference based on the confidence interval for the slope will match the inference of the F-test
true
if H0:a=0 is rejected will the confidence interval contain 0?
no
if H0:a=0 is failed to be rejected, will the CI contain 0?
yes
if the confidence interval is -7.891 to 0.001, does it contain 0? what does this mean?
yes, fail to reject H0
if the confidence interval is 0.001 to 7.891, does it contain 0? what does this mean?
no, reject H0
what is the coefficient of determination, or goodness of fit?
r^2
a proportion of the total variation in y that is explained by x in regression
what does r^2 tell us?
what proportion of the total variation in y that is explained by x in regression
if r^2=0 what does this mean?
there is no relationship
if r^2=1 what does this mean?
there is a perfect relationship
what does the R coefficient tell us?
the strength and direction of the linear relationship
if the R is negative, is the slope positive or negative?
negative
t/f: r^2 tells us the % of y explained by x
true
if the r^2=.565, what % of the y is explained by x?
56%
if the data is linear, should we use Pearson or Spearman?
Pearson
how do we report simple linear regression?
Simple linear regression was conducted to investigate the linear relationship between [IV] and [DV].
Results indicated a [significant OR no] linear relationship between [IV] and [DV] (R2 = [], F(regression df, residual df) = [], p = []).
The slope coefficient was [], indicating the [DV] [increased/decreased] by [slope] for each [IV].
The R2 indicated [R2*100]% of the variation in [DV] can be explained by the regression model with [IV].
what is the general goal of simple regression?
fit a “best” line to 2 continuous variables x and y
what are the specific goals of simple regression?
test the linear x-y relationship via slope (a)
estimate a and b
use estimated regression equation to predict y bar for values of x
estimate a proportion of total variation in y explained by x in regression
what is the difference bw correlation and regression?
correlation is a more general method that can be done any time regression can be done, but not vice versa bc regression requires prediction bw IV and DV
regression provides more info w/x-y prediction.
t/f: in simple linear regression, the correlation coefficient is a “companion” to the slope and provides related info regarding the slope
true
what does multiple regression test?
more than 1 IV to predict a DV y
t/f: the more predictors (IVs) the more accuracy in predicting the y
true
what is the multiple regression equation?
yi=a1x1i+a2x2i+…+apxpi + b + epsilon i
what does a1 represent in the multiple regression equation?
the regression coefficient for its predictor x1
a slope
what is the process of multiple regression testing?
1) conduct the F test of the H0: a1=a2=a3…=ak=0
–> 1st determine if any of the coefficients are non-zero (f-test)
–> none of the variables are linearly related
2) p is less than or equal to a in #1- t tests on each individual coefficient (a1=0, a2=0, a3=0, etc)
–> 1 sample t test (comparing to a constant (0))
–> if some coefficients are non-zero, determine which are non-zero
3) p>a in #1-no further testing needed
–> none of the variables are related to DV
–> if all coefficients are zero, no need to test further