Lecture 3 Flashcards
What is a correlation?
A relationship between variable A and B. It is the easiest way of quantifying a relationship between 2 numeric variables
Are correlations useful?
Not particularly useful so we follow them up with regression
What is the value of a perfect positive relationship?
r=1
What does r=0 mean?
A complete absence of a relationship
What is the value of a perfect negative relationship?
r=-1
Name the two features of the correlation
Direction and magnitude
What does it mean if there is a weak relationship?
Someone’s performance on the Y variable is not determined by their performance on X, and there must be other variables involved
What should you always do first?
Visualise the data. Knowing a mean and SD doesn’t give you much unless you visualise the points, you need to plot data to see the distribution of the data
Name the different types of scatterplot and what they look like, and would you use r?
- Normal - linear, yes.
- Curvilinear - semi-circle shape, r would underestimate the strength of the relationship because it only measures linear relationships
- A perfect relationship between X and Y - all point lie on a straight line except one outlier, r would underestimate the strength of the relationship
- No relationship - vertical line with an extreme outlier, r would lead to an overestimation
Why do outliers pose a big threat when the sample is small?
Because they exert a strong influence on the results - can influence the value and interpretation of the correlation. Need to plot your data to see if there are any point which don’t lie with the rest of your data set
What would qualify as at outlier?
There is no absolute definition, but in general, a score that is more than 3SDs from the mean would qualify as an outlier
What do z-scores do?
Convert individuals’ scores on a variable to standard scores which are normally distributed with a mean of 0 and SD of 1. Need to standardise the score to understand where it falls on the distribution
When are deviation products positive/negative, respectively?
When X and Y scores are both above the mean/both below the mean OR when X and Y scores are not both above the mean/below the mean
How can the correlation between X and Y can be obtained?
- Multiplying each individual’s z-score on X by their z-score on Y
- Summing these products
- Dividing this sum by N or by N-1
Why do we have to do n-1?
If not, we have an over-estimated correlation