Unit 11 Correlation vs Regression Flashcards
Correlation Analysis examines..
The strength (intensity) of the relationship between two experimental variables.
Two things correlation analysis assesses
If the amount of variation in one variable is explained by another variable
If the explained variation is greater than can be expected by chance alone, ie. is it significant
Regression Analysis…
Measures the form of the relationship and describes
the functional relationship between variables
Where does regression usually apply
Applies when we have control of the X variable
(independent) and can measure it essentially without error
the centroid is…
The equivalent of the mean for a double statistical series
The covariance is
The covariance between X and Y (Sxy) is the equivalent to the variance (S) for a double statistical series
(two continuous variables, X and Y)
What does covariance describe
Describes the dispersion of two or more quantitative variables
Is symmetric: The covariance Sxy equals covariance Sxy(Diagonally)
❑ Variances are always positive (squared)
❑ Covariances can be negative: vary from -∞ to + ∞
Variance is…
Covariance matrix or dispersion matrix (dispersion around the mean) (S matrix)
Covariance measures…
Covariance measures the joint dispersion of two quantitative variables around their centroid
Correlation is…
Correlation is a statistical technique used to determine the degree to which two variables are related
Two correlation coefficients are considered:
Pearson product moment correlation coefficient (r)
employed with interval or ratio scaled variables (parametric)
Spearman rank order correlation coefficient (r rho) employed with ordinal or ranked data (non-parametric)
Pearson Correlation Coefficient (rxy) measures
Only the linear relationship between two quantitative variables
It is the covariance of standardized variables
In correlation a positive relationship means…
Means that ‘individuals’ obtaining high scores on one variable tend to obtain high scores on a second variable.
The converse is also true, i.e. individuals scoring low on one variable tend to score low on a second variable.
In correlation a negative relationship means…
Means that ‘individuals’ scoring low on one variable tend to score high on a second variable
Conversely, individuals scoring high on one variable tend to score low on a second variable
Residuals is..
The distance between points and what you are measuring.
They are the error terms.
If there are small residuals then there is strong a correlation.
The total variance of y can be partitioned in:
- explained variance
- residual variance
Linear regression is:
Linear regression is a form of statistical modelling (we are predicting the future)
Linear regression can aim to:
Describe a functional model linking y and x
Test hypothesis regarding model parameters.
Predict y as a function of y.
How to quantify correlation between variables:
Pearson’s correlation coefficient
What does Pearson’s Correlation Coefficient assess.
The sign of r denotes the nature of association (positive or negative)
The value of r denotes the strength of the association between (-1) and (+1)
The further r is away from 0 the larger the sample size. So the less likely the association could have occurred by chance if there was no real association between the variables considered.
The closer the r value is to 0…
The greater the variation around the line
When do we use Pearson r significance test?
To test if a relationship exists between two variables, and determine the magnitude and direction.
To test if two sets of paired measurements, neither of which is clearly independent of the other, are linearly associated.
Assumptions of Pearson’s r significance test
For each subject in the study there must be related pairs of scores:
- If a subject has a score on variable x, then the same subject must have a score on variable y.
- The variables must have a linear relationship. ie. the relationship can be most accurately represented by a straight line.
Rules for Pearson’s correlation hypothesis testing
- Significance (two-tailed) or p-value > 0.05, reject H0 There is no correlation between the two variables.
- Significance (two-tailed) or p-value ≤ 0.05, reject H0
Assume there’s linear correlation between the variables. There is a significant association between variables.
Word of caution for correlation and causation
Correlation doesn’t imply causation
Correlation analysis examines:
the strength (intensity) of a relationship between two variables.
Regression analysis measures:
the form of the relationship and describes the functional relationship between variables.
regression usually applies when we have control of the x variable (independent) and can measure it without error.
Regression:
While correlation is concerned with the magnitude and direction of the relationship, regression focuses on using the relationship as a prediction model
Linear regression is a form of:
Statistical modelling
(we are predicting the future)