Week 10: Tests of Relationships - Correlation; Regression Flashcards
Test difference (is group A different that group B/does this treatment cause this outcome) for what type of outcomes?
continuous outcomes
Test proportions (is group A different that group B/does this treatment cause this outcome) for what type of outcomes?
categorical outcomes
How do you test relationships between two groups?
- Measure correlation between A and B
2. Fit a regression model for A and B
Example: Is the male group different from the female group by their BMI level?
Test the difference of the mean BMI between the groups of male and female
Example: Is the gender linked to a certain age group?
Compare the proportions of being older than 50 yrs between the groups of male and female
Example: What is the relationship between BMI and DBP at baseline?
Does DBP increase as BMI increase?
Test relationship between DBP and BMI
When you look at the relationship between two variables -
Correlation
How do you calculate correlation of data?
- Draw a scatter plot to visualize
2. Compute a correlation coefficient (r) to quantify
If r = 0.96, how is weight loss and exercise time correlated?
- Weight loss is highly correlated with exercise time
- Weight loss increases as exercise time increases
way of visualizing the relationship between two variables -
scatter plot
T/F Scatter plot can visually clarify the strength and shape of a relationship
True
T/F Correlation coefficient r is a measure of a linear association between two variables in the range of 0 and 1
False, in the range of -1 and 1
T/F The sign of r indicates the direction of the correlation
True
T/F The absolute value of r indicates the strength of the correlation
True
Interpretation of Correlation Coefficient r: 0-.25 = .25-.5 = .5-.75 = .75-1 =
0-.25 = Little to no relationship .25-.5 = fair .5-.75 = moderate to good .75-1 = good to excellent
T/F Correlation Coefficient r values can be used as strict cutoff points.
False, values should NOT be used as strict cutoff points, as they are affected by:
- sample size
- measurement error
- the types of variables being studied
correlation coefficient r is a measure of what type of relationship only?
- Linear relationship only
- A curvilinear relationship won’t be described accurately using the linear correlation coefficient r
For interpretation, look at what two things to determine if association in the set of data is linear or curvilinear?
- r value
2. scatter plot
What is the most commonly reported measure of correlation?
Pearson (produce-moment) correlation coefficient
When is it appropriate to use Pearson (produce-moment) correlation coefficient?
when X and Y are continuous variables with underlying normal distributions
What is a nonparametric analog of the Pearson r?
Spearman rank correlation coefficient
When is it appropriate to use the spearman rank correlation coefficient?
when X and Y are ordinal variables
When would you use phi coefficient?
- appropriate for use when both X and Y are dichotomous variables (2 variables)
- special case of the Pearson correlation coefficient, given only two values of X and Y
When would you use The point biserial correlation coefficient?
- appropriate when a dichotomous X is correlated with a continuous variable Y
- a special case of the Pearson correlation coefficient
Example: What coefficient would you use?
The data representing the motor and verbal skills in a group of 60 adults with traumatic brain injury. Scores are graded pass or fail, and assign 1 to pass and 0 to fail.
Phi coefficient
Example: What coefficient would you use?
The data represents the developmental scores on tests of proximal (reaching) and distal (prehensile skill) behaviors in 12 normal infants, 30 weeks of age
Pearson (Product-Moment) Correlation Coefficient
Example: What coefficient would you use?
The data representing the ratings of elbow flexor spasticity (resistive force in kilograms) for patients who have had a stroke on the right (1) or left (0) sides
Point Biserial Correlation Coefficient
Example: What coefficient would you use?
The data represents the scores of verbal and reading comprehension for a sample of 10 children with learning disability
Spearman Rank Correlation Coefficient
Precautions in Using Correlation Coefficient:
Nonlinearity -
- Pearson correlation coefficient is all about linear relationship
- A strong curvilinear relationship may be identified as no correlation (r=0)
Precautions in Using Correlation Coefficient:
Outlier -
A single point can have a large influence on the correlation
Precautions in Using Correlation Coefficient:
Interpretation is subjective -
- No hard and fast rules that determine an r value is strong, moderate or weak
- Interpretation should base on the nature of the data, the purpose of the research and the researcher’s knowledge of the subject matter
Precautions in Using Correlation Coefficient:
Causation -
- No causal relationship can be determined based on the correlation coefficient value
- Correlation of A and B is the same as correlation of B and A
- Correlation cannot be used to establish a cause-and-effect situation
When you look at the relationship between two variables in a cause-and-effect situation?
Regression
How do you calculate a regression line?
- Draw a scatter plot with a regression line to visualize
2. Compute a coefficient of determination (R2) to quantify
Ex: if r = 0.96; R2 = 0.92, what is the relationship between weight loss and exercise time?
- Weight loss is highly correlated with exercise time
- Exercise time predicts the Weight loss
Regression used to predict values of one variable to another: x-> y
x =
y =
x = independent (predictor/explanatory) variable y = dependent (outcome/response) variable
used to examine the causal relationship of the two variables, X and Y, that are linearly related?
Linear regression
y=a +bX
used to examine the causal relationship of the two variables, X and Y, when Y is binary?
Logistic regression
y=a+bX
Example: Linear or logistic regression
Mobility and functional characteristics could predict whether a person did or did not have a history of falls.
Logistic
Example: Linear or logistic regression
The data consisting of the systolic blood pressure (Y) and age (X) in a sample of 10 women. Base it to predict a woman’s blood pressure by knowing her age
Linear
Example: Linear or logistic regression
Physical/psychological function as predictors of successful return to work on year after traumatic brain injury
Logistic
Logistic regression reports odds ratio (OR) to estimate what?
the odds of membership in the target group linked to the predictor variable
T/F Confidence intervals can also be determined for each OR for logistic regression
True
A significant OR will not contain what within its CI?
null value of 1.0
How to determine if assumptions for regression analysis have been met?
Patterns of residuals (Y-axis) plotted against predicted scores (X-axis).
If Patterns of residuals (Y-axis) plotted against predicted scores (X-axis) provide horizontal band, what does that mean?
demonstrates that assumptions for linear regression have been met
If Patterns of residuals (Y-axis) plotted against predicted scores (X-axis) provide curvilinear pattern, what does that mean?
indicates nonlinear relationship.
Coefficient of determination R2
measure of proportion, indicating the accuracy of prediction based on X
what does R^2 represent?
percentage of the total variance in the Y scores that can be explained by the X scores
1-R^2 represents?
percentage of the total variance in Y not explained by the X scores
Example: For the regression of blood pressure on age, r=0.87; R2=0.76
- 76% (R2) of the variance in systolic blood pressure explained by the variance in age
- 24% (1-R2) of the variance in systolic blood pressure not explained by the variance in age
What do the researchers use correlation for?
Identify the relationship between two variables
What do the researchers use regression analysis for?
Predict the effect of one variable upon another
T/F Correlation can be used to establish a cause-and-effect relationship
False, Regression not correlation
T/F A hard and fast rule exists for its interpretation in terms of strong, moderate or weak correlation.
False, no hard and fast rule
T/F It is important to include all the data points to compute the correlation coefficient correctly
False
T/F A strong linear relationship can be identified as a high correlation coefficient value
True
T/F Correlation coefficient is the ratio of risks of exposure vs. non-exposure groups
False, Odds ratio not correlation
T/F Coefficient of determination (R2) is the percentage of variance in the predictor variable accounted for by the outcome variable
False, variance of outcome variable accounted for by the predictor variable
T/F Coefficient of determination (R2) is the proportion of variance in the outcome accounted for by the predictor variable
True
T/F Correlation coefficient (r) value ranges from 0 to +1
False, -1 to +1
T/F Within the same study, the correlation coefficient (r) value can be computed by squaring the coefficient of determination (R2) value
False, the opposite
T/F Coefficient of determination (R2) value ranges from -1 to 1
False, 0-1
Which of the following statistical procedure is proper to use when you want to study the relationship between the two variables, being interested in knowing which would have caused the other?
Regression analysis
T/F Logistic regression is used to predict a continuous outcome variable
False, linear regression -> continuous and logistic -> categorical
T/F Coefficient of determination (R2) is a good measure for an index of predicted variance
True
T/F Coefficient of determination (R2) is a measure of linear association between two variables
False, r is
T/F Linear regression is used to predict a binary outcome variable
False, continuous outcome variable