Bivariate data Flashcards
what type of data do SRCC and PMCC work on
random bivariate data
how does SRCC work
draws a 4 quadrant grid with centre as the mean of the x and y variable, then find average of multipling the x and y distances from origin together to get covariance - 1st and 3rd relative quadrants r positive and vice versa, if strong correlation then will net a high covariance so high correlation
then divide by something to scale from -1 to 1
equation of PMCC
sum of xy/ root (sum xx times sum xy)
or ε(xi - barx)(yi - bary)/root(ε(xi - barx)^2(yi-bary)^2)
or use formula booklet version
sum of xx, xy, yy
(sum of ab format)
= sum of ai^2 - (sum of bi)^2/n
r
statistic to estimate p
if mod r less than critical value then support H0
how does SRCC work
rank the values of both variables separately and see the difference in ranking of both - implying a level of correlation
so a perfect 1 or -1 correlation doesnt have to be linear
SRCC formula
1 - 6ε(di^2)/ n(n2 - 1)
where d is difference in ranking of one bivariate data point
Association and correlation
Association is any multi-variable relationship
Correlation is a special association in a relationship with two random variables
in a hypothesis test conclusion following PMCC or SRCC you mostly refer to association
when to use what regression lines
if x is controlled use y on x
if y is controlled use x on y
if both are random e.g scattergraph both usable
but not appropriate for points on a curve
or the line may fit certain data ranges but not others
y on x regression line
sum of horizontal distances from lobf as low as possible
used to predict y values based on given x values
y = a + bx where a = bar y - b bar x
and b = Sxy / Sxx
or y - bar y = b(x - bar x)
x on y regression line
sum of horizontal distances squared from lobf as low as possible
used to predict x values based on given y values
x = a + by where a = bar x - b bar y
and b = Sxy / Syy
or x - bar y = b(y - bar y)
Residuals
measure of error from regression line (horizontal distance for x on y and vertical distance for y on x)
for y on x : ri = yi - a - bxi
for x on y : ri = xi - a - byi
Residual importance
Sum of all residuals = 0
Sum of all residuals squared used to measure accuracy of regression line
alternatively r2 = PMCC^2 as another test of accuracy
Hypothesis tests with correlation coefficients
for PMCC H0 is no correlation H1 is correlation
for SRCC H0 is no association H1 is association
for PMCC :
or H0 p = 0
H1 p = 1 where p is population correlation coefficient