Bivariate Data Flashcards

1
Q

What are properties of bivariate data?

A

Both variables are considered
Both could be random
Usually displayed on scatter graph
Relationship could be linear (LoBF)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the PMCC?

A

Product Moment Correlation Coefficient is used to test to see how strong a correlation is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the formula for covariance?

A

S(xi - xbar)(yi - ybar) /// n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is covariance?

A

Covariance gives to what extent points are positively or negatively correlated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the formula for PMCC?

A

Covariance /// Standard Deviation

Sumxy /// Sqr(Sumxx x Sumyy)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do the values of PMCC mean?

A

PMCC = -1 (Perfect Negative Correlation)
PMCC = 0 (No Deviation)
PMCC = 1 (Perfect Positive Correlation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What values of PMCC are considered strong correlation?

A

r < -0.7
r > 0.7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the formula for Sxy ?

A

Sum(xi yi) - n(xbar)(ybar)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the formula for Sxx ?

A

Sum(xi2) - n(xbar)2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is population correlation coefficient estimated?

A

Often, a small sample is used to calculate correlation coefficient ‘r’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the steps to estimating ‘p’ using ‘r’?

A

Write down the null and alternate hypothesis
Write down significance level
Calculate PMCC(r)
Find critical values from table
Compare ‘r’ to critical value and finding conclusion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the formula for Spearman’s Rank Correlation Coefficient?

A

rs = 1 - Sum(di2) /// n(n2 - 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do the values of SRCC mean?

A

rs = 1 is perfect agreement
rs = -1 is perfect disagreement
rs = 0 is no agreement or disagreement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does perfect agreement mean?

A

Ranking is constant in two data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does perfect disagreement mean?

A

Where ranking is opposite in two data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is association?

A

Association is any relationship between two variables

17
Q

What is correlation?

A

Correlation is a subset of association where eh relationship is linear and variables are random

18
Q

Why is SRCC different to PMCC?

A

In SRCC, association replaces correlation as linear relationship cannot be assumed
rs = +-1 does not imply perfect agreement/disagreement

19
Q

Why are hypothesis tests used on PMCC?

A

Hypothesis tests test whether a ‘r’ value is statistically significant as an indicator for ‘p’

20
Q

What is a constant in all hypothesis tests of PMCC?

A

The null hypothesis suggest that p = 0

21
Q

What is the least squares regression line of y on x?

A

This is the regressive line for which the sum of the squares of the vertical distance of each point is as small as possible

22
Q

What is the formula for the least squares regression line of y on x?

A

y = a + bx

Where:
b = Sum(xy) / Sum(xx)
a = ybar - b(xbar)

23
Q

Why might a LOBF not be appropriate in some situations?

A

If the relationship is non-linear
It may not model some range of values

24
Q

What is the least squares regression line of x on y?

A

The line for which the sum of squares of the horizontal distance of each point is as small as possible

25
Q

What is the formula for least squares regression line of x on y?

A

x = a + by

Where:
b = Sum(xy) / Sum(yy)
a = xbar - b(ybar)

26
Q

Why is only one regression line used in real life?

A

Only one regression line is calculable because one of the variables is controlled (not random)
Where ‘x’ is controlled, the line is y on x

27
Q

What is the residual?

A

The residual of a data point is a measure of the “error” from the regression line

28
Q

What is the formula for the residual for y on x?

A

ri = yi - a - bxi

29
Q

What is the formula for the residual for x on y?

A

ri = xi - a - byi

30
Q

What is the residual for y on x?

A

For y on x, the residual is the vertical distance between point and regression line

31
Q

What is the residual for x on y?

A

For x on y, the residual is the horizontal line between point and regression line

32
Q

What is the coefficient of determination?

A

Sum(ri2) measures how close points are to regression line

33
Q

What does the value of the coefficient of determination mean?

A

The closer to 0, the worse the model
The closer to 1, the better the model

34
Q

What does the coefficient of determination also equal?

A

Coefficient of determination = PMCC

35
Q

How can a graph show a set of data is appropriate for PMCC?

A

If the set of points are approximately elliptical on the graph, it suggests bivariate normal distribution, meaning it is appropriate for PMCC