week 6 Correlation Flashcards
Correlation
A simple form of relationship between 2 variables. It assesses the degree and direction of a relationship.
A positive correlation is where as the 1st variable increases, so too does the 2nd variable.
A negative correlation is where as the first variable increases, the 2nd variable decreases
Correlations range from -1 to 1. If data is random, correlation approaches or equals zero.
Regression
Is an extension of correlation. It allows for the prediction of one variable, based on the scores from another variable.
X axis
Typically is for the predictor variable. This variable is the one used to make a prediction for the other variable (Y).
X is usually the independent variable.
Y axis
is typically used for the criterion variable
Y is usually the dependent variable.
Regression Line
also known as “line of best fit”. Used on a scatterplot. Once have a regression line, are able to make a prediction for Y, given X.
Pearson’s Correlation Coefficient (r)
r=COVxy/SxSy
where COVxy=covariance
Sx=standard deviation of X
Sy= standard deviation of Y.
The null hypothesis states that there is no correlation between the 2 variables, rho (p)=0.
The 2-tailed hypothesis states that rho does not equal zero.
The 1-tailed hypothesis states either rho<0 (negative correlation)or rho>0(positive correlation).
To determine the significance of r, we use a t statistic:
t=r [square root of (N-2)]/square root of (1-r2)
degrees of freedom for this t is N-2.
Note that r=0.75 is considered a strong positive relationship.
r=0 to 0.3 considered small
Covariance
this is a number which indicates the degree to which 2 variables vary together.
COV=[Σ(X-X-)(Y-Y-)]/N-1
r2
r2 is the predicted or explained variance. It represents the percentage of variance accounted for in one variable, due to the other variable. So if r=0.75, then r2=0.56=56% of the variability in 1st variable is explained by the 2nd variable.
Factors that can affect the Correlation Coefficient
- If the range is restricted, ie the standard deviation of the variables is very small, then usually (but not always), the magnitude of the correlation is reduced.
- outliers or extreme data points can have a big impact on the correlation coefficient (reduces it).
- Heterogenous sub samples. This most commonly occurs when unknowingly the data has 2 subsets (eg male and female) and their combined correlation coefficient may be substantially different than if they each had had a correlation coefficient calculated.