7: Correlation & Linear Regression and Multiple Linear Regression Flashcards
___ is a group of techniques to measure the relationship between two variables.
Correlation analysis
The typical first step of a correlation analysis is a ___.
Scatter diagram
___ is a measure of strength of association between two interval level or ratio level variables.
Correlation coefficient
A correlation coefficient can range from ___ to ___.
-1.00 to 1.00
If there is absolutely no relationship between the two sets of variables, Pearson’s r is ___.
Zero
___ is an equation that expresses the linear relationship between two variables.
Regression equation
___ is a mathematical procedure that uses the data to position a line with the objective of minimizing the sum of the squares of the vertical distances between the actual y values and the predicted values of y.
Least squares principle
___ is a mathematical procedure that uses the data to position a line with the objective of minimizing the sum of the squares of the vertical distances between the actual y values and the predicted values of y.
Standard error of estimate (Sy⋅x)
___ is the proportion of the total variation in the dependent variable Y that is explained, or accounted for, by the variation in the independent variable X.
Coefficient of determination (r²)
Regression analysis provides two statistics to evaluate the predictive ability of a regression equation: ___ and ___.
- The standard error of the estimate (Sy⋅x)
2. The coefficient of determination (r²)
What are the 4 Assumptions Underlying Linear Regression?
- ___
- ___
- ___
- ___
- For each value of x, there are corresponding y values. These y values follow the normal distribution.
- The means of these normal distributions lie on the regression line.
- The standard deviations of these normal distributions are all the same. The best estimate we have of this common standard deviation is the standard error of estimate (sy·x).
- The y values are statistically independent. This means that in selecting a sample, a particular x does not depend on any other value of x. This assumption is particularly important when data are collected over a period of time. In such situations, the errors for a particular time period are often correlated with those of other time periods.
A relationship exists between the predicted values yˆ and the standard error of estimate (sy⋅x).
___ will include the middle 68% of the observations.
___ will include the middle 95% of the observations.
___ will include virtually all the observations.
yˆ±sy⋅x
yˆ±2sy⋅x
yˆ±3sy⋅x
In constructing intervals:
The first interval estimate is called a ___ interval.
The second interval estimate is called a ___ interval.
Confidence interval
Prediction interval
The ___ variable is scaled on the Y-axis and is the variable being estimated.
Dependent
The ___ variable is scaled on the X-axis and is the variable used as the predictor.
Independent
The number of degrees of freedom associated with the error term is equal to ___
The total degrees of freedom, n − 1
___ is the percent of variation in the dependent variable, y, explained by the set of independent variables, x1, x2, x3, …
Coefficient of multiple determination
The characteristics of the coefficient of multiple determination are:
- ___
- ___
- ___
- ___
- It is symbolized by a capital R squared. In other words, it is written as R2 because it is calculated as the square of a correlation coefficient.
- It can range from 0 to 1. A value near 0 indicates little association between the set of independent variables and the dependent variable. A value near 1 means a strong association.
- It cannot assume negative values. Any number that is squared or raised to the second power cannot be negative.
- It is easy to interpret. Because R2 is a value between 0 and 1, it is easy to interpret, compare, and understand.
___ is a test used to determine if any of the set of independent variables has regression coefficients different from zero.
Global test
To test the null hypothesis that the multiple regression coefficients are all zero, we employ ___.
The F distribution
The decision rule can be based on either of two methods:
- ___
- ___
(1) comparing the test statistic to a critical value
(2) calculating a p-value based on the test statistic and comparing the p-value to the significance level
The critical value method using the F-statistic requires three pieces of information:
- ___
- ___
- ___
(1) the numerator degrees of freedom
(2) the denominator degrees of freedom
(3) the significance level
The ___ distributions of the coefficients follow the t distribution with n − (k + 1) degrees of freedom.
Sampling
The sampling distributions of the coefficients follow the ___ degrees of freedom.
n − (k + 1)