topic 5 Flashcards
What is correlation?
The degree of relationship between two variables.
How is correlation different to a Paired T-test?
Correlation asks if one variable changes when another does, not if they are significantly different to one another.
What is the degree of correlation quantified by?
r: the correlation coefficient which ranges from -1 to +1.
How is covariance calculated?
The calculation of r is based on covariance.
The X mean - misused from all X- values, same for Y. The sum of these are then multiplied together. This sum is divided by n - 1.
No squaring, just multiplying with the corresponding value, it is therefore covariance not variance.
How is covariance dictated?
Dictated by a change in X leading to an increase in Y (positive), leading to an decrease in Y (negative). Or, if X and Y are independent then r is 0.
What did Pearson Product Moment show?
Showed that covariance needed to be scaled and that r would always range from -1 to +1.
How is r calculated?
Sx and Sy multiplied together. Covariance is then divided by this value.
(significant relationship when r> r,crit)
What is the difference between two-tailed and one-tailed?
One tailed if previous work has been done to determine the direction of covariance e.g. positive or negative.
Two-tailed if no direction has been predicted.
Note: think of normal distribution curve and the tails of extreme values.
What are the assumptions of Pearson correlation?
X and Y should be normally distributed.
When do you use Pearson correlation?
Can use when you want to determine the relationship between 2 variables if they are both continuous and approximately normal.
What is Spearman’s r?
Can use this instead of Pearson correlation if only ranked data are available.
Sometimes used even when continuous data are available since it avoids the assumption of normal distribution of X and Y. It is not always used however as you lose specificity of data and therefore power of the test.
What is the main warning whilst working with r values?
Completely different data distributions that have no correlation at all can have the same r value; must always plot data!
What is the difference between correlation and regression?
Both test the relationship between two variables but regression assumes causation and allows for extrapolation of data (line of best fit).
What statistical methods must you use if there are multiple response variables and multiple explanatory variables? e.g. human size (multiple ways to measure size and multiple outputs).
Multivariate statistics.
Or, amalgamate variables with a formula.
Usually use the former as the latter assumes the influence of all variables to be equal. But, the former does increase type 1 errors.
Why do you need a strong element of design for MV statistics?
You are not testing a hypothesis, therefore you need to know what you want to know.
What are the uses of Principal Components Analysis?
Data reduction, relationships, isolation of variation and not dependent on P-values.
What are PC1 and PC2?
Principal component 1 is the re-orientation of data to maximise the variation along a new axis.
PC2 is the addition of an orthogonal axis (also maximises the data, but on a new condition that a new axis must be applied at 90 degrees to the first).
Both are chosen to give the axes maximum variation.
Note: Just rotating the data.
What does re-orientation mean for data points?
Nothing! No information is lost, points all stay at the same relative distance compared to one another.
How is data reduced in PCA?
The addition of a new axis means that the most important PCs can be chosen; so only one variable has to be analysed.
But, the PCs depend on shape!
Note: there are no limit to dimensions.
Reducing data down to a single PCA for 3D data is not as useful as doing the same for 1D data.
What are the criticisms of MV statistics?
Usually used without a clear question in mind.
No identification of causation, just patterns.
But it is important to remember it is a useful method to try and decipher what might be happening.