7: Correlation & Linear Regression and Multiple Linear Regression Flashcards

1
Q

___ is a group of techniques to measure the relationship between two variables.

A

Correlation analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The typical first step of a correlation analysis is a ___.

A

Scatter diagram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

___ is a measure of strength of association between two interval level or ratio level variables.

A

Correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A correlation coefficient can range from ___ to ___.

A

-1.00 to 1.00

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If there is absolutely no relationship between the two sets of variables, Pearson’s r is ___.

A

Zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

___ is an equation that expresses the linear relationship between two variables.

A

Regression equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

___ is a mathematical procedure that uses the data to position a line with the objective of minimizing the sum of the squares of the vertical distances between the actual y values and the predicted values of y.

A

Least squares principle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

___ is a mathematical procedure that uses the data to position a line with the objective of minimizing the sum of the squares of the vertical distances between the actual y values and the predicted values of y.

A

Standard error of estimate (Sy⋅x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

___ is the proportion of the total variation in the dependent variable Y that is explained, or accounted for, by the variation in the independent variable X.

A

Coefficient of determination (r²)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Regression analysis provides two statistics to evaluate the predictive ability of a regression equation: ___ and ___.

A
  1. The standard error of the estimate (Sy⋅x)

2. The coefficient of determination (r²)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 4 Assumptions Underlying Linear Regression?

  1. ___
  2. ___
  3. ___
  4. ___
A
  1. For each value of x, there are corresponding y values. These y values follow the normal distribution.
  2. The means of these normal distributions lie on the regression line.
  3. The standard deviations of these normal distributions are all the same. The best estimate we have of this common standard deviation is the standard error of estimate (sy·x).
  4. The y values are statistically independent. This means that in selecting a sample, a particular x does not depend on any other value of x. This assumption is particularly important when data are collected over a period of time. In such situations, the errors for a particular time period are often correlated with those of other time periods.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A relationship exists between the predicted values yˆ and the standard error of estimate (sy⋅x).

___ will include the middle 68% of the observations.
___ will include the middle 95% of the observations.
___ will include virtually all the observations.

A

yˆ±sy⋅x
yˆ±2sy⋅x
yˆ±3sy⋅x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In constructing intervals:

The first interval estimate is called a ___ interval.
The second interval estimate is called a ___ interval.

A

Confidence interval

Prediction interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The ___ variable is scaled on the Y-axis and is the variable being estimated.

A

Dependent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The ___ variable is scaled on the X-axis and is the variable used as the predictor.

A

Independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The number of degrees of freedom associated with the error term is equal to ___

A

The total degrees of freedom, n − 1

17
Q

___ is the percent of variation in the dependent variable, y, explained by the set of independent variables, x1, x2, x3, …

A

Coefficient of multiple determination

18
Q

The characteristics of the coefficient of multiple determination are:

  1. ___
  2. ___
  3. ___
  4. ___
A
  1. It is symbolized by a capital R squared. In other words, it is written as R2 because it is calculated as the square of a correlation coefficient.
  2. It can range from 0 to 1. A value near 0 indicates little association between the set of independent variables and the dependent variable. A value near 1 means a strong association.
  3. It cannot assume negative values. Any number that is squared or raised to the second power cannot be negative.
  4. It is easy to interpret. Because R2 is a value between 0 and 1, it is easy to interpret, compare, and understand.
19
Q

___ is a test used to determine if any of the set of independent variables has regression coefficients different from zero.

A

Global test

20
Q

To test the null hypothesis that the multiple regression coefficients are all zero, we employ ___.

A

The F distribution

22
Q

The decision rule can be based on either of two methods:

  1. ___
  2. ___
A

(1) comparing the test statistic to a critical value

(2) calculating a p-value based on the test statistic and comparing the p-value to the significance level

23
Q

The critical value method using the F-statistic requires three pieces of information:

  1. ___
  2. ___
  3. ___
A

(1) the numerator degrees of freedom
(2) the denominator degrees of freedom
(3) the significance level

24
Q

The ___ distributions of the coefficients follow the t distribution with n − (k + 1) degrees of freedom.

A

Sampling

25
Q

The sampling distributions of the coefficients follow the ___ degrees of freedom.

A

n − (k + 1)

26
Q

The linearity assumptions are:

  1. ___
  2. ___
  3. ___
A
  1. Residuals are plotted on the vertical (y) axis and centered at zero.
  2. Residual plots show a random distribution of positive and negative values across the horizontal (x) axis.
  3. Plots are scattered and there is no obvious pattern.
27
Q

___ compares all possible models using a specified set of predictors, and displays the best-fitting models that contain one predictor, two predictors, and so on.

A

Best subset regression

28
Q

A graph that helps to evaluate the assumption of normally distributed residuals is called a ___ and is often included in statistical software. It supports the assumption of normally distributed residuals.

A

Normal Probability Plot

29
Q

___ is the variation around the regression equation is the same for all of the values of the independent variables.

A

Homoscedasticity

31
Q

___ is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy.

A

Multicollinearity

32
Q

Several clues that indicate problems with multicollinearity include the following:

  1. ___
  2. ___
  3. ___
A
  1. An independent variable known to be an important predictor ends up having a regression coefficient that is not significant.
  2. A regression coefficient that should have a positive sign turns out to be negative, or vice versa.
  3. When an independent variable is added or removed, there is a drastic change in the values of the remaining regression coefficients.
33
Q

A general rule is… if the correlation between two independent variables is___, then there likely ___ a problem using both of the independent variables.

A

between −0.70 and 0.70

is not

34
Q

___ is the term used for when successive residuals are correlated

A

Autocorrelation

35
Q

___ is a step-by-step method to determine a regression equation that begins with a single independent variable and adds or deletes independent variables one by one. Only independent variables with nonzero regression coefficients are included in the regression equation.

A

Stepwise regression

35
Q

___ is a variable in which there are only two possible outcomes. For analysis, one of the outcomes is coded a 1 and the other a 0.

A

Dummy variable

36
Q

The null hypothesis and the alternate hypothesis regarding the Significance of the Correlation Coefficient are:

H0: ___
H1: ___

A

H0: ρ = 0 (The correlation in the population is zero.)
H1: ρ ≠ 0 (The correlation in the population is different from zero.)

36
Q

The number of degrees of freedom in the regression is equal to ___ in the multiple regression equation.

A

The number of independent variables

36
Q

The regression degrees of freedom is ___.

A

k

37
Q

The residual or error degrees of freedom is ___, and is the same as n − (k + 1).

A

(n − 1) − k

39
Q

The total variation of the dependent variable, y, is divided into two components:

  1. ___
  2. ___
A

(1) regression, or the variation of y explained by all the independent variables
(2) the error or residual, or unexplained variation of y