Ch. 16 Flashcards
correlation coefficient
Measures the strength are direction of the relationship between two numerical variables
Parameter and estimate of the correlation coefficient?
rho (p) (Parameter) and r (estimate)
What is the covarience?
The coveriance is the measure of the relationship between X and Y
Σ(Xi - Xmean)(Yi - Ymean) / n-1
See pg. 521
t-test for the null hypothesis of no relationship? (p=0)
t = r/SEr
Coefficient of determination (r2)
Describes the proportion of variation in one variable that can be predicted from the other
Assumptions of correlation?
Random sample
Bivariate normal distribution (i.e. X is normally distributed with equal variance for all values of Y, and Y is normally distributed with equal variance for all values of X)
Bivariate normal distribution
X is normally distributed with equal variance for all values of Y
Y is normally distributed with equal variance for all values of X
(i.e. a bell-shape probably distribution in 2-D space)
What does a bvariate normal distribution look like?
Partial list
- Linear relationship
- cloud of scatter plot is circular or elliptical in shape
- Frequency distributions of X and Y (individually) are normal
- Histograms of X and Y should both appear normal
Methods to go around assumption of bivariate normality?
Transform the data (suggest log, square root, or arcine)
Use non-parametric methods
Spearman’s rank correlation
Measure the strength and direction of the linear association between the ranks of two variables
Assumptions of spearman’s rank?
Individuals are randomly chosen from a population
Assumes monotonic relationship (i.e. assumes linear relationship between X and Y)
Measurement error
The difference between the true value of a variable for an individual and its measured value
Attenuation
Bias in correlation estimate caused by measurement error in X or Y (or both)
The more attenuation, the lower the estimated magnitude of p (closer to 0 on average).
How does the correlation coefficient depend on range?
The correlation will be expected to be weaker when a narrow range of X values is represented
Degrees of freedom for t-test?
n-2 (as you need to use two summaries of the data, X bar and Y bar, when calculating r)