Bivariate data Flashcards

1
Q

what type of data do SRCC and PMCC work on

A

random bivariate data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how does SRCC work

A

draws a 4 quadrant grid with centre as the mean of the x and y variable, then find average of multipling the x and y distances from origin together to get covariance - 1st and 3rd relative quadrants r positive and vice versa, if strong correlation then will net a high covariance so high correlation
then divide by something to scale from -1 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

equation of PMCC

A

sum of xy/ root (sum xx times sum xy)
or ε(xi - barx)(yi - bary)/root(ε(xi - barx)^2(yi-bary)^2)
or use formula booklet version

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

sum of xx, xy, yy
(sum of ab format)

A

= sum of ai^2 - (sum of bi)^2/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

r

A

statistic to estimate p
if mod r less than critical value then support H0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how does SRCC work

A

rank the values of both variables separately and see the difference in ranking of both - implying a level of correlation
so a perfect 1 or -1 correlation doesnt have to be linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

SRCC formula

A

1 - 6ε(di^2)/ n(n2 - 1)

where d is difference in ranking of one bivariate data point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Association and correlation

A

Association is any multi-variable relationship
Correlation is a special association in a relationship with two random variables
in a hypothesis test conclusion following PMCC or SRCC you mostly refer to association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

when to use what regression lines

A

if x is controlled use y on x
if y is controlled use x on y
if both are random e.g scattergraph both usable
but not appropriate for points on a curve
or the line may fit certain data ranges but not others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

y on x regression line

A

sum of horizontal distances from lobf as low as possible
used to predict y values based on given x values

y = a + bx where a = bar y - b bar x
and b = Sxy / Sxx
or y - bar y = b(x - bar x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

x on y regression line

A

sum of horizontal distances squared from lobf as low as possible
used to predict x values based on given y values

x = a + by where a = bar x - b bar y
and b = Sxy / Syy
or x - bar y = b(y - bar y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Residuals

A

measure of error from regression line (horizontal distance for x on y and vertical distance for y on x)
for y on x : ri = yi - a - bxi
for x on y : ri = xi - a - byi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Residual importance

A

Sum of all residuals = 0
Sum of all residuals squared used to measure accuracy of regression line
alternatively r2 = PMCC^2 as another test of accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Hypothesis tests with correlation coefficients

A

for PMCC H0 is no correlation H1 is correlation
for SRCC H0 is no association H1 is association
for PMCC :
or H0 p = 0
H1 p = 1 where p is population correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly