Chapter 3: Exploring Two-Variable Quantitative Data Flashcards
Define Response Variable
a quantitative variable that measures the outcome of a study. It is the y-variable
Define Explanatory Variable
a quantitative variable that tries to predict or explain changes in the response variable. It is the x-variable.
Define Scatterplot
a graph that shows the relationship between two quantitative variables measured on the same individual. Each individual is a point (x,y) on the graph
Describing a Scatterplot
DUFS + context
- Direction (positive/negative/no association)
- Unusual Features (points that don’t fit the overall pattern or are distinct clusters
- Form (linear/non-linear)
- Strength (strong/moderate/weak)
Define Correlation
measures the strength and direction of a LINEAR association
symbol = r (equation on the formula sheet)
Facts about Correlation [r]
- -1 ≤ r ≤ 1
- indicates direction by its sign
- r = ±1 = perfect linear relationship; r = 0 = weak.
- both variables must be quantitative to calculate r
- does not matter which variable is x or y
- does not rely of units of measure
- has no unit of measurement
Cautions about Correlation [r]
- correlation ≠ causation
- does not measure form
- should only be used to describe linear relationships
- not a resistant measure of strength
Define Regression Line
a line that models how a response variable changes as an explanatory variable changes
(predicted y) = a + bx
Define Extrapolation
using the regression line to predict outside the x-values that were used to calculate the line.
extremely unreliable
Define Residual
the difference between he actual value of y and the predicted value of y.
- (+) = actual point is above the regression line; under prediction
- (-) = actual point is below the regression line; over prediction
Interpretation of Correlation [r]
The linear association between __x-context__ and __y-context__ is __weak/moderate/strong (strength)__ and __positive/negative (direction__
ex. the linear association between __student absences__ and __final gradeS__ is __fairly strong__ and __negative__. (r=-0.93)
Interpretation of Residual
The actual __y-context__ was __residual__ __above/below__ the predicted value when __x-context = #__.
ex. the actual __heart rate__ was __4.5 beats per minutes__ __above__ the number predicted when __Matt ran for 5 minutes__.
Interpretation of Y-Intercept
The predicted __y-context__ when __x=0 context__ is __y-intercept__.
- have to make sense when x=0. When it does not, write “does not make sense when x=0”
ex. the predicted __time to checkout at the grocery store__ when there are __0 customers in line__ is __72.95 seconds__.
Interpretation of Slope
The predicted __y-context__ __increases/decreases__ by __slope__ for each additional __x-context__.
ex. the predicted __heart rate__ __increases__ by __4.3 beats per minute__ for each additional __minute jogged__.
Interpretation of Standard Deviation of Residuals [s]
The actual __y-context__ is typically about __s__ away from the value predicted by the LSRL.
ex. The actual __SAT score__ is typically about __14.3 points__ away from the value predicted by the LSRL.
Interpretation of Coefficient of Determination [r^2]
About __r^2 %__ of the variation in __y-context__ can be explained by the linear relationship with __x-context__.
ex. About __87.3 %__ of variation in __electricity production__ is explained by the linear relationship with __wind speed__.
Define Least-Squares Regression (LSR) Line
the line that makes the sum of the squared residuals as small as possible
Define Residual Plot
a scatterplot that shows the residual as the y-value and the explanatory as the x-variable.
- if X pattern, the regression model is good
- if O pattern, the regression model is not good–> diff form of regression model is needed
- menu - 4 - 7 - 2
Other measures for how well the data fits the LSR Line
- standard Deviation of the Residuals, s
- Coefficient of Determination, r^2
Define Standard Deviation of the Residuals, s
measures the typical distance between the actual y-values and the predicted y-values.
equation on the formula sheet
Define Coefficient of Determination, r^2
measures the percent reduction in the sum of squared residuals when using the LSR line to make predictions, instead of just using the mean of the y-values.
R^2 is the percent of the variability I the response variable (y in context) that is accounted for by the LSR line.
Calculate LSR Line
- use b equation –> find b
- use y bar equation –> find a
- use y hat equation –> plug in
Facts about the LSR Line
- distinction between the explanatory and response variables is essential
- close connection between the correlation and slope of the LSR Line (same sign)
- slope always passes through (x bar, y bar) -> (y hat - y bar) = r*Sy/Sx(x - x bar)
- LSRL are not resistant measure
Define Outlier in regression
a point that does not follow the pattern of the data and has a large residual