Covariance & Correlation Flashcards
What is covariance?
Reflects the degree to which two variables vary together
How does covariance relate to correlation?
They both measure the linear relationship between two variables; covariance is taken from original scale scores so is affected by measurement scale; correlation is not, it’s a standardised measure
How do you interpret a covariance coefficient?
It’s the average product of the deviation scores of two variables
How do you interpret a correlation coefficient?
Pearson’s r; it tells us whether a relationship is likely to have occurred by chance (0-1 indicates magnitude of relationship; sign (+/-) indicates direction); divide covariance by SxSy (standard deviations of x & y)
Why do we need to test r for significance?
To determine whether the linear relationship between two variables in a sample is large enough to infer a linear relationship in the population, or if the correlation is due to sampling error
What’s the best predictor of the criterion (variable on Y axis)?
The mean of the criterion
Describe the line of best fit
It captures the relationship between the variables on a scatterplot
What does the numerator of covariance involve?
Finding extent to which scores differ from the mean of a variable (both X & Y); multiplying the two deviation scores (XY) for each participant; adding up deviation scores across all participants (SPxy)
What does the denominator of covariance involve?
Dividing SPxy by N-1 so that the covariance is independent of the number of scores
How do we calculate Pearson’s r?
Divide covariability of X & Y (COVxy) by the separate variabilities of X & Y (SxSy)
If we use the standard score formula (ZxZy/N-1) what don’t we need to do?
Calculate the covariance (but still must calculate standard scores for X & Y)
State the statistical & conceptual hypotheses for testing the significance of r
Null: rho (correlation in population) = 0; there is no linear relationship between X & Y in the population; Alternative: rho /= 0; there is a linear relationship between X & Y in the population
What formula do we use to test the significance of r?
t = r x square root of N - 2 divided by the square root of 1 - r squared
What is the degrees of freedom?;
Which table do we look up?
N - 2;
t table
The larger the N, the smaller the what?
Absolute value of r needed for significance
What are the assumptions of Pearson’s correlation coefficient r?
A bivariate (linear) normal distribution based on interval or ratio scales
What does r squared mean?;
What does k squared mean?
Coefficient of determination; proportion of variance in one variable that is explained by the variance on another;
Coefficient of non-determination (aka error or residual variance); amount of variance that can’t be predicted by the other variable
What do the point-biserial & Spearman’s rank (Spearman’s rho) correlations test?
Point-biserial examines the relationship between a dichotomous variable & a continuous variable; Spearman’s rho assesses degree of agreement between ranks, monotonic relationships, linear relationships, skewed data & data with extreme outliers; it tests significance but with less power if N<10
Describe the process of conducting a point-biserial correlation;
What other test could we use to get the same result?
Assign/dummy code values to the levels of the IV (e.g. 0 & 1); calculate SPxy, SSx & SSy; use Pearson’s r to calculate r pb; test for significance (t formula); compare with crit t (df = N-2); interpret result;
Independent groups t-test
In a point-biserial correlation, what depends on the scoring of the dichotomous variable?;
What does not?;
How is the direction of the difference observed?
The sign;
Absolute value of r pb;
By looking at the means
Describe the process of conducting a Spearman’s rho correlation
If needed, assign the X & Y scores to ranks; calculate SPxy, SSx & SSy; use Pearson’s r formula to calculate r(s); interpret result
If the two variables are not normally distributed, can we determine a correlation coefficient
Yes
If r = .50, what is the amount of variance that the two variables share?
.25
If we use descriptive statistics to test relationships or differences in samples, what do we use inferential statistics for?
To determine if there is sufficient evidence to conclude that a relationship or difference is likely to genuinely exist in the population
What allows you or doesn’t allow you to make causal inferences?
Research design, not the statistical test