UNIT 2 - REGRESSION Flashcards

1
Q

How do you describe a scatterplot?

A

DIRECTION

Unusual Features

FORM

STRENGTH

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

describe a scatterplot’s strength?

A

give the r value (if straight),

or say…

“tightly packed… loosely packed”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how do you describe direction?

A

positive or negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how do you describe form of a scatterplot?

A

straight or curved?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Diff between association or correlation?

A

association is talking about a relationship.

If you see a pattern in the scatterplot, there is an association.

Correlation is an actual calculated number (two quantitative variables)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is it called the “least squares regression line?”

the LSRL?

A

Because, after you find the mean-mean point, you fix the line so that it minimizes the squared vertical distancesto that line from each point.

It minimizes the squared residuals, the least squares….

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you find outliers in regression?

A

they don’t follow the “flow”

(pinky trick, cover with you pinky.. Then uncover.. Does it follow the flow?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What values can r be?

A

from -1 to +1

(r near 0 is WEAK)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the line that you plot?

A

IT IS A MODEL!

It is the LSRL and it is the model we are talking about

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is a linear model?

A

It is an equation you can use or a line of a graph,

but it is just a model that says what kind of happens,

and can be used to ESTIMATE WHAT MIGHT HAPPEN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does r2 tell us?

(r-squared)

A

It tells us the percent of variablility of y that is explained by the model with x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What if a scatterplot goes straight across horizontally?

A

NO ASSOCIATION.

That would be like height and IQ, they are independent so each height has about the same IQ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Does r2 tell direction?

A

NO

r2 is always positive, so you can’t use it to see if the relationship is negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Can there be a correlation between grade and music preference?

A

No, music preference is categorical.

There is an association, however.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Does the regression line (LSRL) go through a lot of points?

A

No, usually it goes through NONE!

It just goes through the center of the cloud of points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Does a high r value mean anything?

(can it look strong, but not be?)

A

Sure. It can. It tells you strength of LINEAR relationship.

BUT

CHECK THE SCATTER. One outlier can make it look STRONG.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is the LSRL

A

the “least squares regression line”

that line you plot

OR

That equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does r tell us?

A

The direction (+/-) and how strong a LINEAR relationship is between two QUANTITATIVE variables… (when linear)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

which is response?

A

y variable,

the Vertical axis..

It “responds” to the x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is b1 and bo ?

A

b1 is the SLOPE,

bo is the intercept.

21
Q

What’s wrong: Age and height have a correlation of 2.7

A

WRONG.

Correlation must be between 1 and -1

22
Q

What should we look for in resid plot?

A

Curve or pattern.

Also, it should have equalish scatter from left to right

It should look RANDOM

23
Q

What if the scatterplot is curved?

A

Straighten the scatter and fit a line,

check the graph and the r.

24
Q

What is extrapolation?

A

Making predictions outside of the x values you have.

25
Q

does correlation mean causation?

A

NO WAY DUDE

26
Q

What’s up with extrapolation? Is it OK?

A

Not ideal. Sometimes it’s all you can do, but state CAUTION.

27
Q

will residual plots always show outliers?

(will outliers always have large residuals?)

A

Not necessarily. Some points have so much leverage, they pull the line up to it…

28
Q

How can you check for “straight enough?”

A

Residuals plot fool!

check the resids

29
Q

What are some strong r values and some weak r values

A

Strong r values are close to 1 or -1, like -0.83 or 0.94. Weak r values are close to zero like 0.10 or -0.06

30
Q

What point is on every regression line?

A

the mean-mean point. (x bar, y bar).

This point is generally not one of the points on the scatterplot.

Usually none of the scatterplot points are on the regression line.

31
Q

Which is explanatory variable?

A

the x

the horizontal axis.

it “explains” what happens to y

32
Q

What do we look for in a residuals plot?

A

To proceed, it should look random.

if there is a pattern, then find a new model or proceed with caution.

33
Q

What is a residual?

A

Vertical distance to the LSRL.

ACTUAL-PREDICTED,

A-P, like this class AP (get it?)

Take y data found and from that, subtract the y you get from plugging the x into the model (equation).

34
Q

is r sensitive to outliers?

A

yes. A single outlier can make it seem like there is a relationship ( if way out in x direction), or even seem like there is no relationship.

35
Q

what is leverage?

A

Far right or left from the middle.

leverage just means it is far away from x-bar

Some leverage points are not influential if they go along with the flow of the scatter.

36
Q

Interpret residual: Points below the line/negative resid

A

“the model overpredicted”

or

“Actual value was below the the expected (or predicted)”

37
Q

Interpret residual: Points above the line/positive resid

A

“the model underpredicted” or “actual performance was above the expected performance

38
Q

Does high r squared mean a good model?

A

CHECK SCATTER FIRST..

Make sure model “FITS” the data.

You should check your scatterplot and residuals plot to make sure model is appropriate and no outliers present… then it means something

So YES, but after you check the resids.

39
Q

How do you interpret slope?

A

For an increase of 1 [unit of x] there is an (increase/decrease) of [SLOPE] [units of y].

You can write “SLOPE UNITS Y/ ONE UNITS X” to help

40
Q

what does influential mean?

A

It impacts the SLOPE.

It means that the point, when added or removed to data, will influence the SLOPE.

Generally these are outliers in the x direction. Far left or right.

41
Q

if you switch x and y does r change?

A

NO. The strength stays the same.

42
Q

Interpret r squared

A

r squared % of variability in y can be explained by the model with x. The rest is in residuals…

43
Q

If there is a crazy outlier, what can you do?

A

Run the analysis with and without the outlier and write about both.

44
Q

how do you interpret y intercept?

A

The model predicts that if there were no [x stuff] this is how much [y stuff] you’d have

45
Q

How do you get equation from computer output?

variable coeff indep: age

constant 7.2

Height 3.5

A

For this case:

age = 7.2 + 3.5 (height)

46
Q

Computer ouput:

What does “constant” mean?

A

It is the y intercept

47
Q

Computer Output:

What is “S”

A

The average, or typical residual..

Standard deviation of the residuals

typical distance from actual value to the model’s prediction.

About how far off your prediction is likely to be.

48
Q

How can you straighten data?

A

Do stuff to the y (square it, root it, log it, etc) and recheck the plot. Remember to put the transformation into your equation. Example Sqrt y = 4.33 - 2.03 x