W6 - Correlation Analysis Flashcards
What does a bivariate scatter plot show?
Relationship between 2 variables that have been measured on 1 sample of subjects.
What are the 3 main types of relationships that can be determined from scatter plots, especially when using a line of best fit.
High-high low-low / +ive correlation
High-low low-high / -ive correlation
Little systematic tendency / 0 or NO correlation
What does it mean when (on a scatter plot) the points cluster around the line of best fit?
Stronger relationship between the 2 variables
What is a correlation coefficient?
Numerical value indicating the extent to which 2 variables are related.
A numerical summary of a bivariate relationship.
How is the strength of the relationship between the 2 variables indicated by using correlation coefficient?
Closer the relationship gets to 1 or -1 = stronger the relationship.
So further away from 0 = stronger the relationship
On a correlation coefficient line, where is High-low low-high / -ive correlation found?
-1
On a correlation coefficient line, where is High-high low-low / +ive correlation found?
1
What must be included in a hypothesis?
Effect of interest (difference/relationship)
Variables
Population of interest
What is a way of saying we have homogeneity of variance?
Saying the data is HOMOSCEDASTIC
What is a way of saying we don’t have homogeneity of variance?
Saying the data is HETEROSCEDASTIC
How can you work out how much of the variance in final body temp (i.e) is explained by the running speed (i.e)?
Variance explained = coefficient of determination (R^2)
Then expresses as a decimal or %
Does correlation mean causation?
Not necessarily
Does correlation mean agreement?
Not necessarily
The correlation coefficient between two variables is r = .90. How much common variance do they share? i.e What is the coefficient of determination?
81%
What is the correlation coefficient also known as?
r value
What is the common variance or variance shared between 2 variables if the relationship between them is represented by a correlation coefficient of 0.8?
r^2 = 0.64
What is the % of UNaccounted variance (variance not shared) between 2 variables if the relationship between them is represented by a correlation coefficient of 0.35?
r = 0.35
r^2 = 0.1225
Variance shared = 12.25%
== Variance NOT shared = 87.75%
If there’s a strong correlation between the values given by 2 variables, what are the 2 things we can’t say for certain when interpreting that relationship?
CAN’T know for certain that the relationship is CAUSAL.
OR
that (in the case of 2 measures of the same thing) the measures agree w/ each other.
Which assumptions can be examined using a only scatterplot graph?
Linearity
Homogeneity of variance
How can you find what the common variance between variables is?
Find r^2 value
Square root = r value
r value (i.e 0.444 or 44.4%) = variance shared
What are the assumptions of the Pearsons Correlation?
Normality
Linearity
Homogeneity of variance
No obvious outliers
Using knowledge of Pearsons Correlation, how do you test its assumption of normality?
Look at Skewness + Kurtosis values for both variables.
If both in their acceptable ranges = theres normality!
Using knowledge of Pearsons Correlation, how do you test its assumption of linearity?
Whether the pattern of scatter plot fits appears to be linear or curved.
Using knowledge of Pearsons Correlation, how do you test its assumption of homogeneity of variance?
If by looking at the scatterplot, if the distribution of y-axis values seems similar across the range of x-axis values.
So theres no fanning.
Using knowledge of Pearsons Correlation, how do you test its assumption of no obvious outliers?
By looking at the scatterplot