Chapter 7: Scatterplots, Associations, and Correlation Flashcards
Define ‘Scatterplots’.
Shows the relationship between 2 quantitative variables measured of the same cases.
Define ‘Associations’.
- Direction: A positive direction or association means that, in general, as one variable increases, so does the other. When increases in onw variable generally correspond to decreases in the other, the association is negative.
- Form: The simplest form is straight, but you should certainly describe other patterns around the underlying relationship.
- Strength: A scatterplot is said to show a strong association if there is a little scatter around the underlying relationship.
Define ‘Outlier’.
A point that does not fit the overall pattern seen in the scatterplot.
Define ‘Response variable, explanatory variable, y-variable, x-variable’.
In a scatterplot, you must choose a role for each variable. Assign the response to the y-axis the response variable that you hope to predict or explain. Assign to the x-axis the explanatory or predictor variable that accounts for, explains, predicts, or is otherwise responsible for the y-variable.
Define ‘Correlation coefficient’.
A numerical measure of the direction and strength of a linear association
r = sum (Zx Zy) / (n-1)
Define ‘Common response’.
Changes in both x and y are caused by a lurking variable.
Define ‘Lurking variable’.
A variable not present in our analysis that may influence our understanding of the relationship between x and y.
Define ‘Confounding variables’.
Variables whose effects on the response variable, y, are entangled and difficult to distinguish.
Define ‘Re-expression’.
We re-express data by taking the log, the sqrt, the reciprocal, or some other mathematical operation of all values of a variable.
Define ‘Ladder of Powers’.
Places in order the magnitude of effects that many re-expressions have on the data.
What are some features of the correlations, r?
- The sign of the correlation gives the direction of the relationship.
- -1<=r<= 1 ; A correlation of 1 or -1 is a perfect linear relationship. A correlation of 0 indicates that there is no linear relationship.
- Correlation has no units, so shifting or scaling the data, standardizing, or even swapping the cariables has no effect on the numerical value.
Is a large correlation a sign of causal relationship?
No.
What are the assumptions and conditions of correlation?
- Quantitative variables Condition
- Straight Enough Condition
- No outliers Condition