Unit 12: Correlation Flashcards
explain the difference between t and z test
- 1-sample t-test
Have the value
you’re testing
against (population
mean) but NO
population SD
z-test
have population
mean & SD
Correlations are between what kinds of variables
- between two continuous variables
- between a dichotomous variable and continuous one
- between an ordinal variable and a continuous one
not looking at group differences, but looking at the associations between variables.
* If two variables are correlated it means that they co-vary.
* Does not imply causation
response variable
dependent
explanatory variable
independent
A researcher would like to know if a mother’s height
can explain how tall her child will be. Which is the
response variable?
a. child’s height
b. mother’s height
c. father’s height
a. child’s height
what do we use correlations for
Two variables that correlate means that as one variable changes, so
does the other. They co-vary.
- A statistically significant correlation indicates that a relation is present
- Null hypothesis: there is no correlation between the variables (or the correlation between the variables is 0)
- Correlations are very flexible
- When two variables are correlated…
- The correlation coefficient quantifies what is common between variables.
The Scatter Plot
Shows the relationship between two quantitative
variables measured on the same individuals.
- The values of one variable appear on the horizontal axis,
and the values of the other variable appear on the vertical
axis. - Each individual corresponds to one point on the graph.
The scatter plot is a visual representation of data, plotting two data distributions in one figure (i.e., two values or scores for each individual)
what does a scatter plot line mean
- The amount of scatter in the points that are plotted suggests the strength of
the relationship between variables. - A positive relationship emerges when the data scatters from the lower left to
the upper right. - A negative relationship emerges when the data scatters from the upper left to
the lower right
After plotting two variables on a scatterplot, we describe the relationship by
examining the form, direction, and strength of the association. We look for an
overall pattern …
- Form: linear, curved, clusters, no pattern
- Direction: positive, negative, no direction
- Strength: how closely the points fit the “form” or how scatter versus close
negative
zero
how do we interpret scatterplots? explain negative and positive association
Correlation Values
- Correlations range from -1.0 to +1.0.
- Values closer to +/- 1 are considered perfect correlations.
- A positive correlation is indicated by a positive value
- A negative correlation is indicated by a negative value
- The correlation coefficient is a measure of the direction and
strength of a linear relationship.
How do you get the correlation coefficient?
The Strength of a Correlation
The sign of the relationship between two variables has nothing to do
with its strength.
* Rule of thumb to determine the strength of a correlation (Visual
Statistics, 2009):
* 0 to .3 are considered “weak” correlations
* .3 to .7 are considered “moderate” correlations
* .7 and above are considered “high” correlation
Correlation: properties of r
- -1 ≤ r ≤ 1
- The sign indicates the direction of association
- positive association: r > 0
- negative association: r < 0
- no linear association: r 0
- The closer r is to ±1, the stronger the linear association
- r has no units and does not depend on the units of
measurement - The correlation between X and Y is the same as the
correlation between Y and X
The Coefficient of Determination
- The proportion of either variable explained by the other variable.
- This value is the significant Pearson correlation value squared: r2xy
- For r=.7; r2 = .49
- 49% of variation in x explained by y OR
- 49% of variation in y explained by x
why is correlation useful
Establishing reliability and validity
* Test-retest reliability
* If you just run a t-test between the two occasions, you may end up finding a statistical
difference even on a reliable exam. There is something called “testing effect” where
people may end up doing better the second time they do it.
* However, even if they do end up doing better the second time around, if the exam is
reliable, the two occasions should have strong correlation.
* How different constructs are related to each other
The Pearson Correlation
The most commonly-used correlation value is the Pearson Correlation
(formally the Pearson Product Moment Correlation), with the following characteristics (think assumption checks):
* The correlation is bivariate—there are just two variables involved.
* Both variables are measured on at least an interval scale (i.e., continuous data).
* The variables have a linear relationship.
* No significant outliers.
* The sampling distribution to which the data belong is normally distributed.
* (Usually satisfied when your sample size is large)
Bivariate correlations
refer to the relationship between two variables.
* There can be correlations between:
* nominal variables (think dichotomous variables),
* ordinal variables,
* interval/ratio variables,
* and variables of different data scales.
The Point-Biserial Correlation
- The point-biserial correlation- relationship between dichotomous
variable and continuous variable - Dichotomous variable must be coded 0 and 1
- Additional assumption:
- Equal variances in Y for each of the categories of the dichotomous variable
- Calculated the same way
What happens when you fail the assumption
checks?
- Spearman Rho and Kendall’s tau
- No need to be “normal”
- Ordinal scale is okay!
- Not too many ties
- Kendall’s tau-b corrects for ties
- Puts them into rank order and looks for monotonic relations
- Positive: Lower on variable A is associated with lower on variable B
- Negative: lower on 1 variable is associated with higher on another variable
Spearman Rho
It assesses how well the relationship between two variables can be described using a monotonic function
Kendall’s tau
Kendall’s Tau is a non-parametric measure of relationships between columns of ranked data. The Tau correlation coefficient returns a value of 0 to 1, where: 0 is no relationship, 1 is a perfect relationship
monotonic function
A monotonic function is a function which is either entirely nonincreasing or nondecreasing