chapter 5: regression Flashcards
π¦Μ = π + πx
This is how you determine the least squares regression line, where βy-hatβ gives a predicted response for any x.
slope b (different from the other b) in the least squares line is calculated as:
The slope of the regression line is the product of the correlation and the standard deviation of y over the standard deviation of x. r times (the standard deviation of y divided by the standard deviation of x)
π = π¦Μ β πxΜ
for determining the least squares resgression line, where a is the mean of y minus the slope times the mean of x
Truse or false: we should always give π2 along with our line to show how valid it is
True, because a least squares regression line can always be created regardless
To extrapolate
is to use values predicted by the line outside of the range of our π₯-values, which we should avoid
a residual
is the vertical distance between any given datapoint and the least-squares regression line
influential points
observations that have values that have more effect on the calculations than the rest, or that would drastically change the calculations if they were removedβ¦ If the discrepancy is only in the π₯ direction (or in the explanatory variable), then the influential point may affect the regression line; if it is in either the π₯ or π¦ direction (or in either variable), then the influential point may affect the correlation and/or the slope of the line
criteria to determine whether or not itβs likely that there is a causal relationship
- The association is strong.
- The association is consistent across different datasets
- Higher values of explanatory variable are associated with higher values of response variable.
- The explanatory variable precedes the response variable (in time).
- The explanatory variable is plausible as a cause of the response variable.
r2
gives the percentage of variation in y that is explained by the least squares regression line.
ecological correlation
correlation based on averages (not to be trusted)
lurking variables
Lurking variables are always potential problems in observational studies. Experiments are necessary to exclude the effect of lurking variables so that we can draw conclusions about the explanatory variable causing changes in the response variable.
residual formula
observed y - predicted y
the four statistics related to regression
slope, y-intercept, correlation, and r2
Interchanging x and y always changes:
slope and y-intercept, but not correlation
to calculate correlationβ¦
take the root of r2 aka the percentage of variation in y that x explains