7: Correlation & Linear Regression and Multiple Linear Regression Flashcards
___ is a group of techniques to measure the relationship between two variables.
Correlation analysis
The typical first step of a correlation analysis is a ___.
Scatter diagram
___ is a measure of strength of association between two interval level or ratio level variables.
Correlation coefficient
A correlation coefficient can range from ___ to ___.
-1.00 to 1.00
If there is absolutely no relationship between the two sets of variables, Pearson’s r is ___.
Zero
___ is an equation that expresses the linear relationship between two variables.
Regression equation
___ is a mathematical procedure that uses the data to position a line with the objective of minimizing the sum of the squares of the vertical distances between the actual y values and the predicted values of y.
Least squares principle
___ is a mathematical procedure that uses the data to position a line with the objective of minimizing the sum of the squares of the vertical distances between the actual y values and the predicted values of y.
Standard error of estimate (Sy⋅x)
___ is the proportion of the total variation in the dependent variable Y that is explained, or accounted for, by the variation in the independent variable X.
Coefficient of determination (r²)
Regression analysis provides two statistics to evaluate the predictive ability of a regression equation: ___ and ___.
- The standard error of the estimate (Sy⋅x)
2. The coefficient of determination (r²)
What are the 4 Assumptions Underlying Linear Regression?
- ___
- ___
- ___
- ___
- For each value of x, there are corresponding y values. These y values follow the normal distribution.
- The means of these normal distributions lie on the regression line.
- The standard deviations of these normal distributions are all the same. The best estimate we have of this common standard deviation is the standard error of estimate (sy·x).
- The y values are statistically independent. This means that in selecting a sample, a particular x does not depend on any other value of x. This assumption is particularly important when data are collected over a period of time. In such situations, the errors for a particular time period are often correlated with those of other time periods.
A relationship exists between the predicted values yˆ and the standard error of estimate (sy⋅x).
___ will include the middle 68% of the observations.
___ will include the middle 95% of the observations.
___ will include virtually all the observations.
yˆ±sy⋅x
yˆ±2sy⋅x
yˆ±3sy⋅x
In constructing intervals:
The first interval estimate is called a ___ interval.
The second interval estimate is called a ___ interval.
Confidence interval
Prediction interval
The ___ variable is scaled on the Y-axis and is the variable being estimated.
Dependent
The ___ variable is scaled on the X-axis and is the variable used as the predictor.
Independent