DECK 5: UNIT 2 - REGRESSION Flashcards

1
Q

How do you describe a scatterplot?

A

DIRECTION

FORM

STRENGTH

and STRANGE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

describe a scatterplot’s strength?

A

give the r value (if straight),

or say…

“tightly packed… loosely packed”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how do you describe direction?

A

positive or negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how do you describe form of a scatterplot?

A

straight or curved?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Diff between association or correlation?

A

association is talking about a relationship.

If you see a pattern in the scatterplot, there is an association.

Correlation is an actual calculated number (two quantitative variables)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is it called the “least squares regression line?”

the LSRL?

A

Because, after you find the mean-mean point, you fix the line so that it minimizes the squared vertical distancesto that line from each point.

It minimizes the squared residuals, the least squares….

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you find outliers in regression?

A

they don’t follow the “flow”

(pinky trick, cover with you pinky.. Then uncover.. Does it follow the flow?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is homoscedasticity?

A

equal scatter along the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What values can r be?

A

from -1 to +1

(r near 0 is WEAK)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the line that you plot?

A

IT IS A MODEL!

It is the LSRL and it is the model we are talking about

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a linear model?

A

It is an equation you can use or a line of a graph,

but it is just a model that says what kind of happens,

and can be used to ESTIMATE WHAT MIGHT HAPPEN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does r2 tell us?

(r-squared)

A

It tells us the percent of variablility of y that is explained by the model with x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What if a scatterplot goes straight across horizontally?

A

NO ASSOCIATION.

That would be like height and IQ, they are independent so each height has about the same IQ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Does r2 tell direction?

A

NO

r2 is always positive, so you can’t use it to see if the relationship is negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Can there be a correlation between grade and music preference?

A

No, music preference is categorical.

There is an association, however.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Does the regression line (LSRL) go through a lot of points?

A

No, usually it goes through NONE!

It just goes through the center of the cloud of points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Does a high r value mean anything?

(can it look strong, but not be?)

A

Sure. It can. It tells you strength of LINEAR relationship.

BUT

CHECK THE SCATTER. One outlier or typo can make it look STRONG.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is the LSRL

A

the “least squares regression line”

that line you plot

OR

That equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does r tell us?

A

The direction (+/-) and how strong a LINEAR relationship is between two QUANTITATIVE variables… (when linear)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

which is response?

A

y variable,

the Vertical axis..

It “responds” to the x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Lurking variable: Why are there more ice cream sales on days that there are more surfing accidents? Is the ice cream putting surfers at risk?

A

The WEATHER is the lurking variable.

When it is a nice day, more surfers and more ice creams are sold.

So, the WEATHER causes both to go up and down together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Give example of incorrectly using the word “correlation”

A

“there is a correlation between gender and video game playing”

This person should say “association.”

You can’t say correlation because gender is categorical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what is b1 and bo ?

A

b1 is the SLOPE,

bo is the intercept.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What’s wrong: Age and height have a correlation of 2.7

A

WRONG.

Correlation must be between 1 and -1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What should we look for in resid plot?

A

Curve or pattern.

Also, it should have equalish scatter from left to right

It should look RANDOM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What if the scatterplot is curved?

A

Either straighten the scatter and fit a line,

or keep it and fit a curve

Try quadreg, cubicreg, lnreg, logreg and check the graph and the r.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is extrapolation?

A

Making predictions outside of the x values you have.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

does correlation mean causation?

A

NO WAY DUDE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What’s up with extrapolation? Is it OK?

A

Not ideal. Sometimes it’s all you can do, but state CAUTION.

30
Q

If something is associatied is it correlated?

A

Not necessarily.

It can be associated and have a zero correlation

( parabolic scatterplot)

or categorical variables.

31
Q

will residual plots always show outliers?

(will outliers always have large residuals?)

A

Not necessarily. Some points have so much leverage, they pull the line up to it…

32
Q

How is r calculated?

A

r = sum(ZxZy) / (n-1)

it is the sum of rectangle areas on the standardizes Z axes

33
Q

How can you check for “straight enough?”

A

Residuals plot fool!

check the resids

34
Q

Give example of correlation without causation and explain the lurking variable.

A

Ski accidents are higher on days with more hot chocolate sales, therefore, hot chocolate must cause ske accidents. (lurking variable: the number of people on the mountain). What is happening is that on days when the mountain is crowded, there are more hot chocolate sales and more ski accidents. So the population on the mountain is causing both to rise and fall together.

35
Q

How do you make a residuals plot? (find RESID?)

A

stat>plot make a scatterplot, but instead of L1 vs L2, change L2 by putting cursor on it and going to 2nd>lists down to RESID.

You can plot L1 vs RESID

or you can plot L2 vs RESID

36
Q

What are some strong r values and some weak r values

A

Strong r values are close to 1 or -1, like -0.83 or 0.94. Weak r values are close to zero like 0.10 or -0.06

37
Q

What point is on every regression line?

A

the mean-mean point. (x bar, y bar).

This point is generally not one of the points on the scatterplot.

Usually none of the scatterplot points are on the regression line.

38
Q

Which is explanatory variable?

A

the x

the horizontal axis.

it “explains” what happens to y

39
Q

What do we look for in a residuals plot?

A

To proceed, it should look random.

if there is a pattern, then find a new model or proceed with caution.

40
Q

What is a residual?

A

Vertical distance to the LSRL.

ACTUAL-PREDICTED,

A-P, like this class AP (get it?)

Take y data found and from that, subtract the y you get from plugging the x into the model (equation).

41
Q

If something is correlated is it associated?

A

Yes.

If it is correlated then it must be associated.

However, if it is associated, it may not be correlated.

42
Q

is r sensitive to outliers?

A

yes. A single outlier can make it seem like there is a relationship ( if way out in x direction), or even seem like there is no relationship.

43
Q

what is leverage?

A

Far right or left from the middle.

leverage just means it is far away from x-bar

Some leverage points are not influential if they go along with the flow of the scatter.

44
Q

Interpret residual: Points below the line/negative resid

A

“the model overpredicted”

or

“Actual value was below the the expected (or predicted)”

45
Q

Interpret residual: Points above the line/positive resid

A

“the model underpredicted” or “actual performance was above the expected performance

46
Q

If r= 0.8.

An x value that is 2 standard deviations above the mean will have a predicted y value that is _______

A

1.6 standard deviations above the mean in the Y direction

47
Q

Does high r squared mean a good model?

A

CHECK SCATTER FIRST..

Make sure model “FITS” the data.

You should check your scatterplot and residuals plot to make sure model is appropriate and no outliers present… then it means something

So YES, but after you check the resids.

48
Q

How do you interpret slope?

A

For an increase of 1 [unit of x] there is an (increase/decrease) of [SLOPE] [units of y].

You can write “SLOPE UNITS Y/ ONE UNITS X” to help

49
Q

How do you interpret slope EQUATION?

rSy/Sx

A

for each increase of 1 st dev in x direction,

you go r st dev in y direction.

2st dev in x, you go 2r st. dev in y.

3st dev in x, you go 3r st. dev in y.

50
Q

what does influential mean?

A

It impacts the SLOPE.

It means that the point, when added or removed to data, will influence the SLOPE.

Generally these are outliers in the x direction. Far left or right.

51
Q

if you switch x and y does r change?

A

NO. The strength stays the same.

52
Q

Can you predict an X by using a Y?

A

NOT WITH THE SAME EQUATION!

BE CAREFUL!! Don’t just solve for x…

You have to change the entire equation and start from scratch.

(run LinReg L2, L1)

53
Q

Interpret r squared

A

r squared % of variability in y can be explained by the model with x. The rest is in residuals…

54
Q

If there is a crazy outlier, what can you do?

A

Run the analysis with and without the outlier and write about both.

55
Q

how do you interpret y intercept?

A

The model predicts that if there were no [x stuff] this is how much [y stuff] you’d have

56
Q

First step in interpreting slope

A

Write “slope units y over 1 unit x” and look at it.

57
Q

How do you get equation from computer output?

variable coeff indep: age

constant 7.2

Height 3.5

A

For this case:

age = 7.2 + 3.5 (height)

58
Q

If you switch x and y will slope change?

A

YES (but not just reciprocal)

slope is rsy/sx ,

to get new slope you can use shortcut:

r2/old slope

(reciprocal times r2)

59
Q

Computer ouput:

What does “constant” mean?

A

It is the y intercept

60
Q

Computer Output:

What is “S”

A

The average, or typical residual..

Standard deviation of the residuals

typical distance from actual value to the model’s prediction.

About how far off your prediction is likely to be.

61
Q

How do you undo squares or cubes?

like if you have x2= stuff

or x3= stuff

A

^ 1/2 or ^ 1/3

(raise stuff to these powers to get x)

62
Q

How do you undo sqrt when solving?

like

sqrt(x) = stuff

A

^2

(raise stuff to power of 2 to get x)

63
Q

How do you undo a log when solving?

log x = stuff

or

log x = m

A

10^ stuff

10stuff

that will get you x

or

x= 10m

64
Q

How can you straighten data?

A

Do stuff to the y (square it, root it, log it, etc) and recheck the plot. Remember to put the transformation into your equation. Example Sqrt y = 4.33 - 2.03 x

65
Q

How do you undo an ln (natural log) when solving?

ex: ln x = stuff
or: ln x = m

A

estuff

or

em

66
Q

if you mult or divide the x’s or y’s (shift/scale) does r change?

A

no. the strength remains the same. (If you log or square it, it will change, but just adding or multiplying won’t change it)

67
Q

What other regressions does your calculator do?

A

Quadreg, cubicreg, lnreg, etc.

just be careful when substituting while writing the equation given.

68
Q

Height and weight has an r value of 0.7. You would expect a person with a height that is 2 st. dev above the mean in height to have a weight that is only___St. Dev above the mean weight.

A

only 1.4 S.D above the mean for weight.

(for each SD in the x direction you change r SD in the y direction)

69
Q

How do you get equation from computer output?

variable coeff indep: doc

constant 0.005

genet - 0.233

A

doc = 0.005 - 0.233 (genet)

70
Q

How do you undo and exponent?

Example

stuffx= other

ax=b

A

log other / log stuff

that gives you x

or

x = (log b) / (log a)