3 Scatterplots and Regression Flashcards

1
Q

Interpreting Correlation

A

Correlation is just r. Gives direction and strength. r = 0.7 means there is a strong positive relationship between these variables. NOTE: r should only be used when the data is roughly linear but just knowing r (even if it is high) doesn’t guarantee linearity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explanatory Variable

A

x, input, independent variable. Usually, this is what is changed. In an experiment, the treatments are the explanatory. Ex. rubber bands explain distance traveled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Response Variable

A

y, output, dependent. This is usually what is measured as a result of changes in the explanatory. Ex. We added rubber bands and measured the distance traveled (response variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe a relationship or scatterplot

A

DUFS (direction, unusual points, form, strength). Usually in one sentence: There is a strong, positive, linear relationship…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Scatterplot

A

Each dot represents 2 variables for one individual. In this graph, the “individuals” are married couples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

LSRL

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Positive Association

A

As x increases, y increases

As x decreases, y decreases

Positive correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Negative Association

A

As x increases, y decreases

As x decreases, y increases

Negative correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Correlation?

A

“r”
Always between -1 and 1.
As r-value becomes closer to 1 (or -1, the correlation becomes stronger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you find r²?

A

usually: It’s on the computer printout.
sometimes: the problem gives you r and you just square it.
The problem gives you the coefficient of determination (that’s just the fancy name of r²)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Coefficient of Determination

A


written as a decimal/percentage
Interpretation: % of the variation/change in y (use context) is explained/accounted for by the LSRL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is it called a least-squared regression line?

A

Of all possible lines of fit, it minimizes the sum of all the squared residuals. Remember a residual is the difference between an actual y-value and the predicted y-value. (not x-value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you find and interpret a residual?

A

Actual minus predicted OR observed - expected (y-ŷ)
To get the predicted, you plug the x-value into the LSRL. They have to give you the actual value.
Interpretation: The actual (y-context) for this (specific x-value) was (residual) more/less than predicted.
Ex. The actual distance traveled for Barbie with 5 rubber bands was 1.47 in. more than predicted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you find and interpret the slope of the LSRL?

A

“b” value in ŷ=a+bx.
Interpret the slope:
For each additional (x-context) the predicted (y-context) (increases/decreases) by (slope).
“For each additional mile driven, the predicted sales price of a truck decreases by $15.” OR “We predict that the sales price of a truck will lose $15 for each additional mile driven.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you find and interpret the y-intercept of the LSRL?

A

“a” value in ŷ=a+bx
Interpret the y-intercept:
When (x=0 context) the predicted (y-context) is (y-int).
“A truck with 0 miles on it is predicted to sell for $45,000”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a residual plot?

A

A plot representing the x values and residual values (y-ŷ).
It’s like you just take the LSRL and make it horizontal and zoom in a little on all of the differences.

17
Q

How do you tell if a linear model is appropriate?

A

If given a residual plot: if there’s no pattern in the residual (a curve or all the points on one side are positive but generally negative on the other side) a liner model is appropriate. If there is a clear pattern in the residual plot, a linear model is not appropriate.

18
Q

How do you find and interpret S (the standard deviation of the residuals)?

A

It has to be given in the problem (usually in the lower left of a computer printout).
Interpretation:
“The actual (y-context) is typically about (s units) away from the number predicted by the LSRL.”
OR “When using the LSRL to predict (y-context) we will typically be off by (s units).”
“The actual sales price of a truck is typically about $1,000 away from the price predicted by the LSRL relating sales price and miles driven” OR “When predicting sales price from miles driven, we will typically be off by $1,000”

19
Q

How do you read a computer printout for LSRL problems?

A
20
Q

How do you answer a question that asks about how confident you are about a prediction?

A

First, if the point you are predicting is far from the data, this is extrapolation and we shouldn’t be confident.
Otherwise, if the fit is strong (r/r^2 close to 1) then you can be confident in your predictions.

21
Q

If you switch the x and y (explanatory and response) what would change?

A

The correlation and r^2 and standard deviation would all stay the same but the LSRL will be completely different. If the slope was positive it will still be positive but it won’t be the same number. The y-int will change too.

22
Q

In regression, what is the difference between an outlier and a high-leverage point?

A

An outlier has a large residual (above or below the line). A high-leverage point is far away from the data to the right or left.