Lecture 9 - Correlation & Regression Flashcards
In a binary relationship, what are x and y ?
x = explanatory variable y = response variable
Give an example of a response variable that you cannot put on y axis bc it is dichotomous (yes/no) ?
x = smoking y = lung cancer
*can’t use lung cancer bc it’s either yes or no
A scatter plot can only be used when both variables are _______
numerical
How can you describe the overall pattern of a scatterplot ?
by the form, direction, and strength of the relationship between the data
Describe the form and direction of the relationship
- Linear relationships, where the points roughly follow a straight line, are especially important.
- Relationships can be negative (A) or positive (B) in direction.
- Curvilinear relationships (C) and clusters (D) are other things to watch for.
- An important kind of deviation is an OUTLIER (E) where an individual value that falls outside the overall pattern
The strength of a relationship is determined by ??
how close the points in the scatter plot lie to a simple form such as a line
We use the _____ to evaluate the variability for univariate data; where n is the sample size
variance
For bivariate data, we use the _______ (the variance in x & y ) where n is the number of x,y pairs.
covariance
The covariance is limited as ?
a tool for measuring and describing relationships because it’s composite units are difficult to translate
Standardized values of variance have ____ units.
no (they are the same whether x is measured in cm or mg)
What do standardized values express?
Their deviations from the mean in terms of their s, and avoid the problem of trying to interpret unit-dependent covariance units.
What is the correlation coefficient ?
The strength of a relationship is quantified by the standardized covariance of the two continuous measures, which is termed the correlation coefficient p (rho) or more formally as the Pearson correlation coefficient.
The sample correlation coefficient is r
What does the correlation coefficient measure?
It measures both the strength and direction of a linear relationship between 2 continuous variables.
What is the square of the correlation coefficient?
r^2 is referred to as the coefficient of determination
When are two variables x and y positively associated ?
when above-average values of one variable tend to go with above-average values of the other; in this case r will be positive
When are two variables x and y negatively associated ?
when above-average values of one variable tend to go with below-average values of the other; in this case r will be negative
For the correlation coefficient, what is the null and alternative hypothesis ?
Null hypothesis: p = 0 (There is no linear relationship
Alternative hypothesis: p does not = 0 (Height and weight are linearly related)
**We can evaluate the correlation, using alpha = 0.05 with degrees of freedom n-2