Regression Flashcards

1
Q

What is the SD line?

A

The SD line goes through the point of averages, and has slope equal to SDY/SDX if the correlation coefficient r is greater than or equal to zero. The SD line has the same slope with a minus infront if the correlation coefficient r is negative.

That is, the SD line clims by SDY when you move to the right by SDX.

In standard units, the slope of the SD line is one if r is greater than or equal to zero, and equal to minus one if r is negative. If SDx is zero, the SD line is not defined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a point of averages?

A

In a scatterplot, the point whose coordinates are the arithmetic means of the corresponding variables. For example, if the variable X is plotted on the horizontal axis and the variable Y is plotted on the vertical axis, the point of averages has coordinates (mean of X, mean of Y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a graph of averages?

A

A graph of averages divides a scatterplot into class intervals of the horizontal (X) variable and plots the averages of the Y values in those intervals against the midpoints of the intervals. That is, it plots a typical value of Y in each interval of values of X. If we wanted to summarize the Y values of points whose X values that fall in some range, the average Y values of those points would be a reasonable summary. That is what the graph of averages displays.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does an SD line typically split the points in a scatterplot?

A

The SD line typically does not split the points in the scatterplot evenly. When the correlation coefficient r is positive, in vertical slices to the left of the point of averages, most of the values of Y are above the SD line, and in vertical slices to the right of the point of averages, most of the Y values are below the SD line. When r is negative, in vertical slices to the left of the point of averages, most of the values of Y are below the SD line, and in vertical slices to the right of the point of averages, most of the Y values are above the SD line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does the graph of averages climb when compared to the SD line?

A

The SD line rises by SDY for each run of SDX. The graph of averages typically rises by less than SDY for each run of SDX. In fact, it rises by about r x SDY for each run of SDX. The average of Y near the average of X is roughly the overall average of Y if the scatterplot is football-shaped, so the point of averages tends to be close to the graph of averages for football-shaped scatterplots.

This suggests that a line that passes through the point of averages and has slope r x SDY/SDX would fit the graph of averages pretty well, giving a reasonable summary of the scatterplot. The line that passes through the point of averages and has slope r x SDY/SDX is called the regression line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the regression line?

A

Tge regression line is a smoothed version of the graph of averages. It goes through the point of averages, and rises by exactly r x SDY for each run of SDX it runs to the right.

Its slope is r x SDY/SDX, compared with SDY/SDX for the SD line. Because |r| ≤ 1, the regression line is not as steep as the SD line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the vertical residual?

A

The vertical residual of a datum from the regression line is the difference between the value of Y for the datum and the height of the regression line at the value of X of the datum: The residual is the verical distance by which the regression line misses the datum:

Vertical residual = (measured value of Y) - (estimated value of Y)

The regression line is the line for which the rms of vertical residuals is smallest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is an independent variable within regression?

A

In regression, the independent variable is the one that is supposed to explain the other; the term is a synonym for “explanatory variable”. Usually, one regresses the “dependent variable” on the “independent variable” . There is not always a clear choice of the independent variable. The independent variable is usually plotted on the horizontal axis. Independent in this context does not mean the same thing as statistically independent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a dependent variable in regression?

A

In regression, the variable whose values are supposed to be explained by changes in the other variable (the independent or explanatory variable). Usually one regresses the dependent variable on the independent variable. Independent in this context does not mean the same thing as statistically independent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When is estimating Y from X using regression reasonable?

A

Estimating Y from X using regression is reasonable only if the following conditions hold:

The scatterplot of Y versus X is roughly football-shaped.

The value of X for which the estimate of Y is sought is within the range of measured values of X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the equation of the regression line?

A

y = r x SDY/SDXx x + [mean(Y)-r x SDY/SDXx mean(x)].

If r = 0, the regression line is horizontal: Its slope is zero.

If r = 1, all the points fall on a line with positive slope. The regression line and the SD line are the same.

If r = −1, all the points fall on a line with negative slope. The regression line and the SD line are the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are extrapolation and interpolation?

A

Estimating the value of Y associated with a value of X that is larger than any of those observed, or smaller than any of those observed, is called extrapolation. (Estimating the value of X associated with a value of Y larger than any of those observed, or smaller than any of those observed, is also extrapolation.)

Estimating the value of Y associated with a value of X that is within the range of the observed values of X but is not equal to any of the observed values of X is called interpolation; so is estimating the value of X associated with a value of Y that is within the range of measured values of Y.

Extrapolation is extremely suspect—without data in the range in which the estimate is wanted, there is no reason to believe that the relationship between X and Y is the same as it is in the region in which there are data. Interpolation is sometimes reasonable when the scatterplot is football-shaped, especially if there are many data near the value of X or Y at which the estimate is sought.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do residual plots show?

A

Residual plots make some aspects of the data easier to see

Residuals have heteroscedasticity, nonlinearity, or outliers only if the original data do too.

It is easier to see heteroscedasticity, nonlinearity, and outliers in a residual plot than in a scatterplot of the original data.

Heteroscedasticity shows up in a residual plot as a difference in the scatter of the residuals for different ranges of values of the independent variable.

Nonlinearity shows up in a residual plot as a tendency for the residuals to be predominantly positive for some ranges of values of the independent variable and predominantly negative for other ranges.

Outliers show up in a residual plot as unusually large positive or negative values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly