Chapter 13- Correlation and Regression Flashcards

1
Q

correlation

A

a statistical procedure used to describe the strength and direction of the linear relationship between two factors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Linear regression

A

also called regression, is a statistical procedure used to determine the equation of a regression line to a set of data points and to determine the extent to which the regression equation can be used to predict values of one factor, given known values of a second factor in a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

scatter plot

A

also called a scatter gram, is a graphical display of discrete data points (x, y) used to summarize the relationship between two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data points

A

the x- and y-coordinates for each plot in a scatter plot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

correlation coefficient (r)

A

used to measure the strength and direction of the linear relationship, or correlation, between two factors. The value of r ranges from −1.0 to +1.0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

positive correlation

A

(0 < r ≤ +1.0) is a positive value of r that indicates that the values of two factors change in the same direction: As the values of one factor increase, the values of the second factor also increase; as the values of one factor decrease, the values of the second factor also decrease.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

negative correlation

A

(–1.0 ≤ r < 0) is a negative value of r that indicates that the values of two factors change in different directions, meaning that as the values of one factor increase, the values of the second factor decrease.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

regression line

A

the best-fitting straight line to a set of data points. A best-fitting line is the line that minimizes the distance of all data points that fall from it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Pearson correlation coefficient (r),

A

also called the Pearson product-moment correlation coefficient, is a measure of the direction and strength of the linear relationship of two factors in which the data for both factors are measured on an interval or ratio scale of measurement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

sum of products (SP)

A

the sum of squares for two factors, X and Y, which are also represented as SSXY. SP is the numerator for the Pearson correlation formula. To compute SP, we multiply the deviation of each X value by the deviation of each Y value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

coefficient of determination (r2 or R2)

A

a formula that is mathematically equivalent to eta-squared and is used to measure the proportion of variance of one factor (Y) that can be explained by known values of a second factor (X).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Homoscedasticity

A

the assumption that there is an equal (“homo”) variance or scatter (“scedasticity”) of data points dispersed along the regression line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Linearity

A

the assumption that the best way to describe a pattern of data is using a straight line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Reverse causality

A

a problem that arises when the causality between two factors can be in either direction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

confound variable

A

or third variable, is an unanticipated variable not accounted for in a research study that could be causing or associated with observed changes in one or more measured variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Restriction of range

A

a problem that arises when the range of data for one or both correlated factors in a sample is limited or restricted, compared to the range of data in the population from which the sample was selected.

17
Q

Spearman rank-order correlation coefficient (rs)

A

or Spearman’s rho, is a measure of the direction and strength of the linear relationship of two ranked factors on an ordinal scale of measurement.

18
Q

point-biserial correlation coefficient (rpb)

A

a measure of the direction and strength of the linear relationship of one factor that is continuous (on an interval or ratio scale of measurement) and a second factor that is dichotomous (on a nominal scale of measurement).

19
Q

phi correlation coefficient (rϕ)

A

a measure of the direction and strength of the linear relationship of two dichotomous factors on a nominal scale of measurement.

20
Q

slope (b)

A

The slope (b) of a straight line is used to measure the change in Y relative to the change in X. When X and Y change in the same direction, the slope is positive. When X and Y change in opposite directions, the slope is negative.

21
Q

y-intercept (a)

A

The y-intercept (a) of a straight line is the value of the criterion variable (Y) when the predictor variable (X) equals 0.

22
Q

method of least squares

A

a statistical procedure used to compute the slope (b) and y-intercept (a) of the best-fitting straight line to a set of data points.

23
Q

Analysis of regression,

A

or regression analysis, is a statistical procedure used to test hypotheses for one or more predictor variables to determine whether the regression equation for a sample of data points can be used to predict values of the criterion variable (Y) given values of the predictor variable (X) in the population.

24
Q

Regression variation

A

the variance in Y that is related to or associated with changes in X. The closer data points fall to the regression line, the larger the value of regression variation.

25
Q

Residual variation

A

the variance in Y that is not related to changes in X. This is the variance in Y that is left over or remaining. The farther data points fall from the regression line, the larger the value of residual variation.

26
Q

standard error of estimate (se)

A

an estimate of the standard deviation or distance that a set of data points falls from the regression line. The standard error of estimate equals the square root of the mean square residual.

27
Q

Three key characteristics are illustrated in this exercise: (adding a constant etc)

A
  1. Adding or subtracting a constant to one set of scores (X or Y) does not change the correlation coefficient.
  2. Multiplying or dividing one set of scores (X or Y) by a positive constant does not change the correlation coefficient.
  3. Multiplying or dividing one set of scores (X or Y) by a negative constant changes only the sign (or direction) of the correlation. The strength of the correlation coefficient remains unchanged.
28
Q
  • Univariate Outlier:
A

An observation that is unusual in either x or y is called a univariate outlier. While interesting, these are not as important in the regression context.

29
Q
  • Regression Outliers
A

Data points that aren’t very well predicted by the model (deviate from the regression line) and have a large residual. Regression outliers can decrease model fit (e.g. r2) and affect parameter estimates (slope and intercept).

30
Q
  • Influence:
A

When a data point is extreme on x and an outlier on y, it can greatly affect the model parameter estimates compared to if it weren’t in the sample. It is considered an influential case.