4: Exploratory Data Analysis—Simple Regression Flashcards
regression lines
summarizes relationship b/w two variables ONLY WHEN one of the variables helps explain or predict the other
that is, describes relationship b/w explanatory variable and response variable
y = b0 + b1x
b1 = slope b0 = intercept, value of y when x=0
least squares regression line
least squares regression line of y on x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible
equation of the least-squares regression line
y-hat = b0 + b1X
slope = b1 = r * (Sy/Sx)
intercept = b0 = y-bar = b1*x-bar
r^2
the fraction of the variation in the value of y that is explained by the least-squares regression of y on x
variance of predicted values y-hat / variance of observed values
residual plot
scatterplot of the regression residuals against the explanatory variable. residual plots help us assess the fit of a regression line
points that are outliers in the ____ direction of a scatterplot are often influential for the least-squares regression line
x
causation
Correlation is NOT causation
even a very strong association b/w 2 variables is not by itself good evidence that there is a cause-and-effect link b/w the two variables.
reverse causation
does x cause y or y cause x?
common response
z causes both x and y
confounding
z is correlation with x and with y, so cannot separate the effect of x on y alone.