Chapter 3 - Scatterplots and Correlation Flashcards
-may help explain or predict changes in a response variable;
-goes on the x axis of a graph
ex- car weight and number of cigarettes smoked
explanatory variable
-measures an outcome of a study;
-goes on the y variable of a graph
ex- accident death rates and life expectancies
response variable
-shows relationship between two quantitative variables measured on the same individuals;
ex- percent of students taking the sat & the mean math score
scatter plot
how to describe a scatter plot
Identify DCFS
direction
correlation
form
strength
when above average values of one variable tend to accompany above average values of the other and when below average values occur together
positive association
when above average values of one tend to accompany below average values of the other
negative association
measures the direction and strength of the linear relationship between two quantitative variables
correlation
value between -1 and 1
correlation (r)
r>0
positive association
r<0
negative association
interpreting correlation
- correlation makes no distinction between explanatory and response variables
- r does not change when we change the unit of measurement of x, y, or both
- correlation does not equal causation
- both variables need to be quantitative
- correlation is not resistant: r is affected by outliers
determined by how close the points in the scatter plot lie
strength
- points show a straight line pattern
- watch out for curved relationships and clusters
form
- summarizes the relationship between two variables when one of the variables helps explain or predict the other
- requires explanatory and response variable
- describes how variable y changes as variable x changes
- used to predict value of y for a given value of x
regression line
regression line
y hat = a+bx
in y hat=a+bx
y hat is the ____
predicted value of the response variable y for a given value of the explanatory variable x
in y hat=a+bx
b represents ____
slope
in y hat=a+bx
a is the ____
y intercept;
predicted value of y when x=0
coefficient of x is always the ____, no matter what symbol is used
slope
use of regression line for prediction far outside the interval of values of the explanatory variable to obtain the line
extrapolation
difference between observed value of the response variable and the value predicted by the regression line;
residual = observed y - predicted y
=y - y hat
residual
if after adding up the predictions, the positive and negative residuals cancel out you should..
square the residuals
- y on x
- line that makes the sum of the squared residuals as small as possible
least squares regression line
the mean of the least squares residuals is always ____
zero
- scatter plot of the residuals against the explanatory variable
- help us assess whether a linear model is appropriate
residual plot
turns the regression line horizontal
residual plot
find form of a residual plot
form of residual plot = form of association - form of model
- gives the approximate size of a typical prediction error (residual)
- use least squares regression line to predict the values of a response variable y from an explanatory variable x
standard deviation of the residuals (s)
- coefficient of determination
- predicts values of the response variable y
- fraction of the variation in the values of y that is accounted for by the least squares regression line of y on x
r^2
relationship between the standard deviation of the residuals (s) and the coefficient of determination r^2
- both calculated from sum of squares residuals
- assess how well the line fits the data
how to calculate least squares regression line
-calculate means x and y and the standard deviations sx and sy and their correlation r
b=r(sy/sx)
when the correlation isn’t r=1 or -1, the predicted value of y is closer to It’s mean y bar than the value of x is to It’s mean x bar
regression to the mean;
values of y “regress” to their mean
least squares regression lines are not ____
resistant
points that are outliers in the __ direction but not the __ direction of a scatter plot have large residuals
y, x
an observation is ____ for a statistical calculation of removed it would change the result of the calculation
influential
how to verify that a point is influential
find the regression line both with and without questionable point