Chapter 5 Flashcards
Linear Regression and Linear Correlation
analyze the relationship between two quantitative variables
Linear Regression
Used to analyze the relationship between two quantitative variables when one variable responds to the other, Explanatory and Response
Linear Correlation
Used to analyze the relationship between two quantitative variables to determine whether a change in one variable is associated with a change in the other variable.
* So, the two variables are co-related.
Direction
Positive correlation: as one variable increases, the other also increases (r has + sign)
Negative correlation: as one variable increases, the other decreases (r has ─ sign)
Form
May be a straight-line relationship (linear) or curved (Here we only deal with linear)
Strength
The magnitude of r indicates the strength of the linear relationship between the two
variables.
r close to -1 or 1 indicates a strong linear relationship
r close to 0 indicates no relationship or a weak linear relation
Lurking Variable (=Extraneous Variable)
Variable that is hidden or not measured and that may cause a change in the measured variables
Confounding Variable
Variables that have confusing effects on other variables, making it difficult to determine which might be the causal variable
Regression equation
the equation of the regression line
Analysis of Residuals
Regression tries to minimize the errors due to deviations not explained by the regression equation (least squares criterion)
Residual = error (e)
vertical distance from the regression line to a data point (may be + or –)
Residual Sum of Squares = Error Sum of Squares (SSE)
the variation in the observed values of the response variable that is not explained by the regression
Least-squares criterion
Tries to minimize the Residual or Error Sum of Squares (SSE) in order to get the “best fit” line
Predictor variable (= explanatory variable)
= x-variable, which can be used to make predictions about the other variable
Response variable
y-variable, whose values respond to changes in the predictor variable
Interpolation
using the regression equation to make predictions about the response variable, within the range of the observed values of x
Extrapolation
using the regression equation to make predictions about the response variable, outside the range of the observed values of x
The coefficient of determination (R2) = [correlation coefficient]2
the fraction or percentage of variation in the observed values of the response variable
that is accounted for by the regression analysis
Always: 0 ≤ R2 ≤ 1
OR 0% ≤ R2 ≤ 100%
Population regression line (linearity) (Conditions) for Regression Inferences
The relationship between the two variables must be approximately linear. In other words, there are constants β0 and β1 such that, for each value x of the predictor variable, the conditional mean of the response variable is β0 + β1x.
Equal standard deviations (homoscedasticity)
(Conditions) for Regression Inferences
The standard deviations of y- values must be approximately the same for all values of x
Normal populations
(Conditions) for Regression Inferences
For each value of x, the corresponding y-values must be normally distributed)
No Serious Outliers
(Conditions) for Regression Inferences
Significant outliers can drastically change the regression model
Independent observations
(Conditions) for Regression Inferences
The observations of the response variable are independent of one another. This implies that the observations of the predictor variable not need to be independent.
Independent observations:
The observations of the response variable are independent of one another. This implies that the observations of the predictor variable not need to be independent.