CH20: Data Coefficients Flashcards
What does the line of best fit show?
How strong a correlation is between 2 variables
What does Pearson’s correlation coefficient show and why is is better than the line of best fit?
Calculates how close a correlation is between 2 variables.
More accurate than the line of best fit by using a correlation coefficient ‘r’
What do the following represent in the formula for Pearson’s correlation coefficient?
x, y, n, r
x = variable 1 y= variable 2 n = number of data entries of each variable r = the correlation coefficient
Understanding the correlation coefficient; what does each value of ‘r’ signify?
r=1, r=-1, r=0?
1 = perfect Positive correlation 0= No correlation -1= perfect Negative correlation
note. The correlation coefficient can only ever be between -1 and 1
What does the Coefficient of Determination show?
calculates the proportion to which a change in Y is determined by a change in X
ie. how directly related 2 variables are/ the magnitude of impact on one variable on another
Formula: coefficient of determination
r^2
1 = perfect correlation = the change in Y value was solely due to the change in X value
0= no correlation= the change in X value had nothing to do with the change in Y value.
*if the r value is not given in the exam then you need to calculate this using Pearson’s correlation coefficient
Definition: Spearman’s Rank Correlation Coefficient (aka the rank correlation coefficient), ‘Rs’
*takes two sets of data, ranks them and then looks at the correlation
Determines the correlation (if any) between rankings of 2 distributions
e.g. the chance that the person who top the weight ranking also tops the height ranking.
Spearman’s Rank Correlation Coefficient - what do the below represent?
Rs, n, d?
Rs = Spearman's Rank correlation coefficient n= number of points in the data d = difference between the rankings
note. step one of the calculation should be to work out the difference between the two rankings for each data point
What are two facts about the value of the Spearman’s rank correlation coefficient?
It CANNOT be negative
It CANNOT be greater than 1
Does correlation always signify relationships?
No; correlation could be due to a third hidden factor
Definition: what is Spurious correlation?
When 2 unrelated variables coincidentally have the same trend pattern (e.g. marmalade consumption and which country wins the Eurovision song contest).
What is extrapolating and what is the limitation?
Extrapolation is making assumptions about results outside of the set of data.
It may result in inaccuracies
What are important factors to consider about the data set when using correlation coefficient calculations?
(i.e. to help avoid inaccuracies)
It is important to verify the connection - don’t use a small narrow/aka skewed sample of data
It is possible to have a negative coefficient of determination?
No, as the coefficient of determination is calculated by squaring r (the correlation coefficient)