Lecture 7 - Statistical Tests III: Correlations & Comparing Two Groups Flashcards
what are the two statistical tests for correlative studies for both parametric and no-parametric data?
correlative parametric data - pearson’s correlation
correlative non-parametric data - spearman’s rank correlation
what will pearsons correlation be used for?
pearsons correlation will be used for two continuous variables where the correlation coefficient “R” describes the strength and direction of the association, numbered between -1 and 1
what does correlation describe?
correlation describes the amount of variation or scatter in a scatter plot
the higher the scatter…
the lower the strength of correlation
R values for positive, negative & no correlations:
positive correlation: r >0
negative correlation: r <0
no correlation: r = 0
what is the difference between a linear regression and a pearsons correlation?
the difference is that with a pearsons correlation there is no line fitted however with a linear regression there is an implemented regression line
pearsons assumptions:
both continuous variables are normally distributed
random sampling
independence of observations
pearsons null hypothesis:
there is no correlation between the variable p (rho) = 0
if the p-value is larger than 0.05, it is not worth discussing the R values
regression or correlation?
how are x & y related? how much does y change with x? = regression
how well are x & y related? = correlation
it is correlation rather than regression if:
it is correlation rather than regression if neither of the two continuous variables is predicted to depend on the other (e.g. there may not be a biological reason to assume such dependant - when the correlation seems to have little reasoning
it is regression rather than a correlation if:
your data comes from an EXPERIMENT as with experiments there is usually a direct relationship [we assume y is dependant on x] between the two variables, therefore a linear regression must be plotted
how can we check to see if it is safe to use pearsons correlation?
after first deducing that it’s a random correlation and not a direct relationship [as a result of experiment], you must check if both variable data sets are of a normal distribution using the shapiro.test command in R
how can we check for normal distribution of variable data before confirming if we can use pearsons correlation?
we attach our data frame and command for the names(data)
then for each name we input:
shapiro.test(variable_1_name)
shapiro.test(variable_2_name)
providing the p-values for both sets of data are ABOVE 0.05 we can assume for normal data distribution
how can you command R to give the pearsons correlation?
cor.test(variable_1, variable_2, method = “pearson”)
note: doesn’t matter what way around your variables are - answer will be the same either way
how do we write up the results of a pearsons cor.test in R?
the (variable one) and (variable two) of (object) were negatively/positively correlated (pearsons correlation; R = value, p = value, N = 15)
what do we receive from a pearsons cor.test command and how do you infer it?
you will get a p-value and a test statistic found underneath “cor” at the bottom of the output which is our correlation coefficient
(1) if the p value is smaller than <0.05 then we can assume that the two variables are correlated
(2) if the cor value if positive it means there is a positive correlation, if the cor value is negative it means there is a negative correlation
if the shapiro.test results are greater/lower than 0.05 we:
> 0.05: data IS normally distributed
<0.05: data IS NOT normally distributed
what is the non-parametric equivalent of the pearsons correlation?
spearman’s rank
spearman’s rank overall function and assumptions:
- ranks both the x and y variable used to calculate a measure of correlation
- assumptions: none about distribution of variables; random sampling; independence of observations
what does spearman’s rank correlation, r/s / R/s describe?
describes the strength and direction of the linear association between the ranks of the two variables, number between -1 & 1