Lecture 13: The Correlational Research Design Flashcards
correlational research
- Intended to demonstrate the existence of a relationship between two variables
- It does not determine cause-and-effect relationships
experimental research
demonstrates a cause-and-effect relationship between two variables
what do correlations describe?
the nature of the relationship
the nature of a relationship
includes its direction and degree
correlational data collection
- No manipulations
- Just measures variables
external validity of correlational research
high
examples of correlational research
- The price of a box of chocolates and its quality (marketing)
- Caffeine intake and alertness (basic research)
- Movie topics and music preferences (art design)
unit of analysis of correlational research
- The unit of analysis can be either a time point or a person
- Usually, in psychological research, it is a person
assumptions of scatter plots
- Each item/person is represented by only one data point
- Each point in a dataset is independent of other points
visualizing correlational associations
The closer the points are to the line, the greater the association between variables
predicting correlations
Knowledge of the score on one dimension leads to the prediction of other dimension
quantitative representation of correlations
A quantitative representation: coefficient coefficient (r) ranges from -1.0 to +1.0
when do we use Spearman’s rho
if one of the variables being correlated is ordinal
when do we use Pearson’s r
When the two variables are on a ratio or interval scale
For both Spearman and Pearson correlations, we want to know:
- Form (linear or nonlinear)
- Sign (+ or -)
- Strength (absolute value between 0 and 1)
linear correlation
- Change in one variable is consistent with change in another variable
- Makes a straight line
nonlinear correlation
change in one variable is not consistent with change in another variable
Spearman’s correlation
- Measures monotonic relationships where there is a consistent directional relationship between x and y but no amount of constant change
- Computed on rank values (smallest to largest)
- Used most with ordinal scale data
- Range= -1 to 1
monotonic relationship
a relationship where each of the two variables has values that continue in one direction or stay the same (neither variable can reverse direction)
You should use the Spearman correlation when:
- The data is an ordinal scale
- The data must be monotonic
- There are at least 5 pairs of data; preferably > 8 pairs
when are ranks meaningful?
when there are not too many or too few pairs
what does Spearman’s correlation coefficient measure?
the strength and direction of the association between two ranked variables
interpreting Spearman’s Rho values
weak= 0.21-0.41
moderate= 0.41-0.60
strong= 0.61-0.80
very strong= 0.81-1.00
the Pearson correlation
- Measures linear relationships, where stores cluster around a straight line
- Y changes consistently and constantly with x
- Used most with interval and ratio scale data
- Range= -1 to 1
what does the term correlation usually refer to?
a Pearson correlation because most behavioural research uses interval or ratio scale data
positive r value
- when larger values of one variable are associated with larger values of another variable (or smaller with smaller)
- r > 0 when x increases, y increases (or when x decreases, y decreases)
negative r value
- when larger values of one variable are associated with smaller values of another
- r < 0 when x increases, y decreases (or when x decreases, y increases)
what does the direction of the correlation indicate?
the nature of the change in the variables
positive linear correlation
- High scores on one variable are matched by high scores on another
- The line slants up to the right
negative linear correlation
- High scores on one variable are matched by low scores on another
- The line slants down to the right
no linear correlation
- No line, straight or otherwise, can be fit to the relationship between the two variables
- Two variables are uncorrelated
linear relationship
the data points in the scatter plot tend to cluster around a straight line
positive linear relationship
- each time the x variable increases by one point, the y variable increases in a consistently predictable amount
- A Pearson correlation describes and measures linear relationships when both variables are numerical scores from interval or ratio scales
nonlinear relationship
- The data points do not cluster around a straight line
- A Spearman correlation describes and measures monotonic nonlinear relationships when one or more variables are ordinal
strength of the correlation
- The degree of association or consistency tells us the strength of the relationship (correlation)
- It is expressed mathematically as a correlation coefficient from -1 to +1
- The stronger the association the closer to -1/+1
interpreting Pearson correlations
no relationship= 0.10
weak relationship= 0.10-0.30
moderate relationship= 0.30-0.70
strong relationship= 0.70-1.00
Spearman vs. Pearson correlation
- Spearman correlation of 1 results when the two variables are monotonically related, even if their relationship is not linear
- This means that all data points with greater x values than that of a given data point will have greater y values as well
- When the data are roughly elliptically distributed (variance on both factors) and there are no prominent outliers, the Spearman correlation and Pearson correlation give similar values
correlation coefficients for nonmonotonic relationships
Both Pearson and Spearman fail (yield values close to 0) for nonmonotonic relationships
outlier
- A data point that differs significantly from others in the set
- Can be an outlier on the X variable or the Y variable
outliers and the strength of the correlation
Outliers can greatly affect the strength of the correlation
how are correlations usually defined?
by the variance in the variable
outliers in Spearman vs. Pearson correlations
- The Spearman correlation is less sensitive to outliers than the Pearson correlation
- This is because Spearman’s p reassigns outliers with a rank and ranks cannot be outliers
significance of correlations
- Statistical significance suggests that a relationship is unlikely to be the result of chance (typical p < .05)
- The probability (alpha) is < 5% that this correlation would have been this large (or larger) due to chance alone
- Most likely represents a real relationship that exists in the population
sample size and correlations
- Small sample sizes are prone to producing large correlations, so the criteria for statistical significance becomes more stringent
- As n increases, so does the likelihood that relationships found exist
how is statistical significance determined?
by consulting a table that takes into account sample size and alpha (p) level
statistical significance
- Related to the p-value associated with n, df, the size of the correlation
- Possible to have a small r but it can be statistically significant if the sample size is big enough
practical significance
Related to any meaningful, real-world consequences of the observed correlation
correlations with a small n
- It is easy to obtain strong correlations with small samples when there is no relationship between the variables
- When n = 2, we always get a correlation of 1.0 or -1.0
- As the sample size increases, it becomes more likely that correlation from the sample reflects a real relationship in the population
coefficient of determination
the shared variance; the percentage of changes in one variable (x) can be accounted for by changes in the other variable (y)
what measurement scale are correlations on?
ordinal; they do not increase in equal increments
correlations and variability
- Correlations help explain some parts of the variability in x and y scores:
- Other different variables can also explain variability in x, y
how do we represent the portion in variability shared by two variables?
- with Venn diagrams
- The larger the degree of overlap, the greater the strength of the correlation
r²
proportion of shared variance/variance accounted for/ coefficient of determination
sign of r²
it is always positive
statistical evaluation of correlations
- You can use the coefficient of determination to measure the percentage of variability in one variable that is determined by its relationship with the other
- Sometimes a variance of 3% is a lot but other times it’s meaningless
- In the behavioural sciences, it is usual to predict only a small proportion of the variance (< 70% of the variance)
interpreting the coefficient of determination
small= 0.01
medium= 0.09
large= 0.25
advantages of correlational methods
- Often quick and efficient
- Often the only method available for practical or ethical reasons
- High external validity
limitations of correlational methods
- Does not tell us why the two variables are related
- Low internal validity
- Very sensitive to outliers
- Directionality problem
- Third variable problem
directionality problem
we don’t know what variable causes what
third variable problem
there might be a third unidentified variable responsible for producing changes in x and y
assuming directionality from correlations
- The frequency of tub bathing was associated with a lower risk of cardiovascular disease among adults
- But, you can’t assume directionality from correlations
- People with lower-stress lifestyles may have time to take more baths
- SES may influence bathing rates and eating patterns
- High-temperature baths can exacerbate cardiovascular disease