7: Correlation & Linear Regression and Multiple Linear Regression Flashcards by Jourdan S

___ is a group of techniques to measure the relationship between two variables.

Correlation analysis

How well did you know this?

Not at all

Perfectly

The typical first step of a correlation analysis is a ___.

Scatter diagram

How well did you know this?

Not at all

Perfectly

___ is a measure of strength of association between two interval level or ratio level variables.

Correlation coefficient

How well did you know this?

Not at all

Perfectly

A correlation coefficient can range from ___ to ___.

-1.00 to 1.00

How well did you know this?

Not at all

Perfectly

If there is absolutely no relationship between the two sets of variables, Pearson’s r is ___.

Zero

How well did you know this?

Not at all

Perfectly

___ is an equation that expresses the linear relationship between two variables.

Regression equation

How well did you know this?

Not at all

Perfectly

___ is a mathematical procedure that uses the data to position a line with the objective of minimizing the sum of the squares of the vertical distances between the actual y values and the predicted values of y.

Least squares principle

How well did you know this?

Not at all

Perfectly

Standard error of estimate (Sy⋅x)

How well did you know this?

Not at all

Perfectly

___ is the proportion of the total variation in the dependent variable Y that is explained, or accounted for, by the variation in the independent variable X.

Coefficient of determination (r²)

How well did you know this?

Not at all

Perfectly

Regression analysis provides two statistics to evaluate the predictive ability of a regression equation: ___ and ___.

The standard error of the estimate (Sy⋅x)

2. The coefficient of determination (r²)

How well did you know this?

Not at all

Perfectly

What are the 4 Assumptions Underlying Linear Regression?

For each value of x, there are corresponding y values. These y values follow the normal distribution.
The means of these normal distributions lie on the regression line.
The standard deviations of these normal distributions are all the same. The best estimate we have of this common standard deviation is the standard error of estimate (sy·x).
The y values are statistically independent. This means that in selecting a sample, a particular x does not depend on any other value of x. This assumption is particularly important when data are collected over a period of time. In such situations, the errors for a particular time period are often correlated with those of other time periods.

How well did you know this?

Not at all

Perfectly

A relationship exists between the predicted values yˆ and the standard error of estimate (sy⋅x).

___ will include the middle 68% of the observations.
___ will include the middle 95% of the observations.
___ will include virtually all the observations.

yˆ±sy⋅x
yˆ±2sy⋅x
yˆ±3sy⋅x

How well did you know this?

Not at all

Perfectly

In constructing intervals:

The first interval estimate is called a ___ interval.
The second interval estimate is called a ___ interval.

Confidence interval

Prediction interval

How well did you know this?

Not at all

Perfectly

The ___ variable is scaled on the Y-axis and is the variable being estimated.

Dependent

How well did you know this?

Not at all

Perfectly

The ___ variable is scaled on the X-axis and is the variable used as the predictor.

Independent

How well did you know this?

Not at all

Perfectly

The number of degrees of freedom associated with the error term is equal to ___

Study These Flashcards

The total degrees of freedom, n − 1

___ is the percent of variation in the dependent variable, y, explained by the set of independent variables, x1, x2, x3, …

Study These Flashcards

Coefficient of multiple determination

The characteristics of the coefficient of multiple determination are:

Study These Flashcards

It is symbolized by a capital R squared. In other words, it is written as R2 because it is calculated as the square of a correlation coefficient.
It can range from 0 to 1. A value near 0 indicates little association between the set of independent variables and the dependent variable. A value near 1 means a strong association.
It cannot assume negative values. Any number that is squared or raised to the second power cannot be negative.
It is easy to interpret. Because R2 is a value between 0 and 1, it is easy to interpret, compare, and understand.

___ is a test used to determine if any of the set of independent variables has regression coefficients different from zero.

Study These Flashcards

Global test

To test the null hypothesis that the multiple regression coefficients are all zero, we employ ___.

Study These Flashcards

The F distribution

The decision rule can be based on either of two methods:

Study These Flashcards

(1) comparing the test statistic to a critical value

(2) calculating a p-value based on the test statistic and comparing the p-value to the significance level

The critical value method using the F-statistic requires three pieces of information:

Study These Flashcards

(1) the numerator degrees of freedom
(2) the denominator degrees of freedom
(3) the significance level

The ___ distributions of the coefficients follow the t distribution with n − (k + 1) degrees of freedom.

Study These Flashcards

Sampling

The sampling distributions of the coefficients follow the ___ degrees of freedom.

Study These Flashcards

n − (k + 1)

The linearity assumptions are: 1. ___ 2. ___ 3. ___

1. Residuals are plotted on the vertical (y) axis and centered at zero. 2. Residual plots show a random distribution of positive and negative values across the horizontal (x) axis. 3. Plots are scattered and there is no obvious pattern.

___ compares all possible models using a specified set of predictors, and displays the best-fitting models that contain one predictor, two predictors, and so on.

Best subset regression

A graph that helps to evaluate the assumption of normally distributed residuals is called a ___ and is often included in statistical software. It supports the assumption of normally distributed residuals.

Normal Probability Plot

___ is the variation around the regression equation is the same for all of the values of the independent variables.

Homoscedasticity

___ is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy.

Multicollinearity

Several clues that indicate problems with multicollinearity include the following: 1. ___ 2. ___ 3. ___

1. An independent variable known to be an important predictor ends up having a regression coefficient that is not significant. 2. A regression coefficient that should have a positive sign turns out to be negative, or vice versa. 3. When an independent variable is added or removed, there is a drastic change in the values of the remaining regression coefficients.

A general rule is... if the correlation between two independent variables is___, then there likely ___ a problem using both of the independent variables.

between −0.70 and 0.70 is not

___ is the term used for when successive residuals are correlated

Autocorrelation

___ is a step-by-step method to determine a regression equation that begins with a single independent variable and adds or deletes independent variables one by one. Only independent variables with nonzero regression coefficients are included in the regression equation.

Stepwise regression

___ is a variable in which there are only two possible outcomes. For analysis, one of the outcomes is coded a 1 and the other a 0.

Dummy variable

The null hypothesis and the alternate hypothesis regarding the Significance of the Correlation Coefficient are: H0: ___ H1: ___

H0: ρ = 0 (The correlation in the population is zero.) H1: ρ ≠ 0 (The correlation in the population is different from zero.)

The number of degrees of freedom in the regression is equal to ___ in the multiple regression equation.

The number of independent variables

The regression degrees of freedom is ___.

The residual or error degrees of freedom is ___, and is the same as n − (k + 1).

(n − 1) − k

The total variation of the dependent variable, y, is divided into two components: 1. ___ 2. ___

(1) regression, or the variation of y explained by all the independent variables (2) the error or residual, or unexplained variation of y

7: Correlation & Linear Regression and Multiple Linear Regression Flashcards

(39 cards)