Bivariate Data Flashcards
What are properties of bivariate data?
Both variables are considered
Both could be random
Usually displayed on scatter graph
Relationship could be linear (LoBF)
What is the PMCC?
Product Moment Correlation Coefficient is used to test to see how strong a correlation is
What is the formula for covariance?
S(xi - xbar)(yi - ybar) /// n
What is covariance?
Covariance gives to what extent points are positively or negatively correlated
What is the formula for PMCC?
Covariance /// Standard Deviation
Sumxy /// Sqr(Sumxx x Sumyy)
What do the values of PMCC mean?
PMCC = -1 (Perfect Negative Correlation)
PMCC = 0 (No Deviation)
PMCC = 1 (Perfect Positive Correlation)
What values of PMCC are considered strong correlation?
r < -0.7
r > 0.7
What is the formula for Sxy ?
Sum(xi yi) - n(xbar)(ybar)
What is the formula for Sxx ?
Sum(xi2) - n(xbar)2
How is population correlation coefficient estimated?
Often, a small sample is used to calculate correlation coefficient ‘r’
What are the steps to estimating ‘p’ using ‘r’?
Write down the null and alternate hypothesis
Write down significance level
Calculate PMCC(r)
Find critical values from table
Compare ‘r’ to critical value and finding conclusion
What is the formula for Spearman’s Rank Correlation Coefficient?
rs = 1 - Sum(di2) /// n(n2 - 1)
What do the values of SRCC mean?
rs = 1 is perfect agreement
rs = -1 is perfect disagreement
rs = 0 is no agreement or disagreement
What does perfect agreement mean?
Ranking is constant in two data sets
What does perfect disagreement mean?
Where ranking is opposite in two data sets
What is association?
Association is any relationship between two variables
What is correlation?
Correlation is a subset of association where eh relationship is linear and variables are random
Why is SRCC different to PMCC?
In SRCC, association replaces correlation as linear relationship cannot be assumed
rs = +-1 does not imply perfect agreement/disagreement
Why are hypothesis tests used on PMCC?
Hypothesis tests test whether a ‘r’ value is statistically significant as an indicator for ‘p’
What is a constant in all hypothesis tests of PMCC?
The null hypothesis suggest that p = 0
What is the least squares regression line of y on x?
This is the regressive line for which the sum of the squares of the vertical distance of each point is as small as possible
What is the formula for the least squares regression line of y on x?
y = a + bx
Where:
b = Sum(xy) / Sum(xx)
a = ybar - b(xbar)
Why might a LOBF not be appropriate in some situations?
If the relationship is non-linear
It may not model some range of values
What is the least squares regression line of x on y?
The line for which the sum of squares of the horizontal distance of each point is as small as possible
What is the formula for least squares regression line of x on y?
x = a + by
Where:
b = Sum(xy) / Sum(yy)
a = xbar - b(ybar)
Why is only one regression line used in real life?
Only one regression line is calculable because one of the variables is controlled (not random)
Where ‘x’ is controlled, the line is y on x
What is the residual?
The residual of a data point is a measure of the “error” from the regression line
What is the formula for the residual for y on x?
ri = yi - a - bxi
What is the formula for the residual for x on y?
ri = xi - a - byi
What is the residual for y on x?
For y on x, the residual is the vertical distance between point and regression line
What is the residual for x on y?
For x on y, the residual is the horizontal line between point and regression line
What is the coefficient of determination?
Sum(ri2) measures how close points are to regression line
What does the value of the coefficient of determination mean?
The closer to 0, the worse the model
The closer to 1, the better the model
What does the coefficient of determination also equal?
Coefficient of determination = PMCC
How can a graph show a set of data is appropriate for PMCC?
If the set of points are approximately elliptical on the graph, it suggests bivariate normal distribution, meaning it is appropriate for PMCC