DECK 4: UNIT 2 - REGRESSION Flashcards
How do you describe a scatterplot?
DIRECTION
FORM
STRENGTH
and STRANGE
describe a scatterplot’s strength?
give the r value (if straight),
or say…
“tightly packed… loosely packed”
how do you describe direction?
positive or negative
how do you describe form of a scatterplot?
straight or curved?
Diff between association or correlation?
association is talking about a relationship.
If you see a pattern in the scatterplot, there is an association.
Correlation is an actual calculated number (two quantitative variables)
Why is it called the “least squares regression line?”
the LSRL?
Because, after you find the mean-mean point, you fix the line so that it minimizes the squared vertical distancesto that line from each point.
It minimizes the squared residuals, the least squares….
How do you find outliers in regression?
they don’t follow the “flow”
(pinky trick, cover with you pinky.. Then uncover.. Does it follow the flow?)
What is homoscedasticity?
equal scatter along the regression line
What values can r be?
from -1 to +1
(r near 0 is WEAK)
What is the line that you plot?
IT IS A MODEL!
It is the LSRL and it is the model we are talking about
what is a linear model?
It is an equation you can use or a line of a graph,
but it is just a model that says what kind of happens,
and can be used to ESTIMATE WHAT MIGHT HAPPEN
What does r2 tell us?
(r-squared)
It tells us the percent of variablility of y that is explained by the model with x.
What if a scatterplot goes straight across horizontally?
NO ASSOCIATION.
That would be like height and IQ, they are independent so each height has about the same IQ.
Does r2 tell direction?
NO
r2 is always positive, so you can’t use it to see if the relationship is negative.
Can there be a correlation between grade and music preference?
No, music preference is categorical.
There is an association, however.
Does the regression line (LSRL) go through a lot of points?
No, usually it goes through NONE!
It just goes through the center of the cloud of points.
Does a high r value mean anything?
(can it look strong, but not be?)
Sure. It can. It tells you strength of LINEAR relationship.
BUT
CHECK THE SCATTER. One outlier or typo can make it look STRONG.
what is the LSRL
the “least squares regression line”
that line you plot
OR
That equation
What does r tell us?
The direction (+/-) and how strong a LINEAR relationship is between two QUANTITATIVE variables… (when linear)
which is response?
y variable,
the Vertical axis..
It “responds” to the x
Lurking variable: Why are there more ice cream sales on days that there are more surfing accidents? Is the ice cream putting surfers at risk?
The WEATHER is the lurking variable.
When it is a nice day, more surfers and more ice creams are sold.
So, the WEATHER causes both to go up and down together.
Give example of incorrectly using the word “correlation”
“there is a correlation between gender and video game playing”
This person should say “association.”
You can’t say correlation because gender is categorical.
what is b1 and bo ?
b1 is the SLOPE,
bo is the intercept.
What’s wrong: Age and height have a correlation of 2.7
WRONG.
Correlation must be between 1 and -1
What should we look for in resid plot?
Curve or pattern.
Also, it should have equalish scatter from left to right
It should look RANDOM
What if the scatterplot is curved?
Either straighten the scatter and fit a line,
or keep it and fit a curve
Try quadreg, cubicreg, lnreg, logreg and check the graph and the r.
What is extrapolation?
Making predictions outside of the x values you have.
does correlation mean causation?
NO WAY DUDE
What’s up with extrapolation? Is it OK?
Not ideal. Sometimes it’s all you can do, but state CAUTION.
If something is associatied is it correlated?
Not necessarily.
It can be associated and have a zero correlation
( parabolic scatterplot)
or categorical variables.
will residual plots always show outliers?
(will outliers always have large residuals?)
Not necessarily. Some points have so much leverage, they pull the line up to it…
How is r calculated?
r = sum(ZxZy) / (n-1)
it is the sum of rectangle areas on the standardizes Z axes
How can you check for “straight enough?”
Residuals plot fool!
check the resids
Give example of correlation without causation and explain the lurking variable.
Ski accidents are higher on days with more hot chocolate sales, therefore, hot chocolate must cause ske accidents. (lurking variable: the number of people on the mountain). What is happening is that on days when the mountain is crowded, there are more hot chocolate sales and more ski accidents. So the population on the mountain is causing both to rise and fall together.
How do you make a residuals plot? (find RESID?)
stat>plot make a scatterplot, but instead of L1 vs L2, change L2 by putting cursor on it and going to 2nd>lists down to RESID.
You can plot L1 vs RESID
or you can plot L2 vs RESID
What are some strong r values and some weak r values
Strong r values are close to 1 or -1, like -0.83 or 0.94. Weak r values are close to zero like 0.10 or -0.06
What point is on every regression line?
the mean-mean point. (x bar, y bar).
This point is generally not one of the points on the scatterplot.
Usually none of the scatterplot points are on the regression line.
Which is explanatory variable?
the x
the horizontal axis.
it “explains” what happens to y
What do we look for in a residuals plot?
To proceed, it should look random.
if there is a pattern, then find a new model or proceed with caution.
What is a residual?
Vertical distance to the LSRL.
ACTUAL-PREDICTED,
A-P, like this class AP (get it?)
Take y data found and from that, subtract the y you get from plugging the x into the model (equation).
If something is correlated is it associated?
Yes.
If it is correlated then it must be associated.
However, if it is associated, it may not be correlated.
is r sensitive to outliers?
yes. A single outlier can make it seem like there is a relationship ( if way out in x direction), or even seem like there is no relationship.
what is leverage?
Far right or left from the middle.
leverage just means it is far away from x-bar
Some leverage points are not influential if they go along with the flow of the scatter.
Interpret residual: Points below the line/negative resid
“the model overpredicted”
or
“Actual value was below the the expected (or predicted)”
Interpret residual: Points above the line/positive resid
“the model underpredicted” or “actual performance was above the expected performance
If r= 0.8.
An x value that is 2 standard deviations above the mean will have a predicted y value that is _______
1.6 standard deviations above the mean in the Y direction
Does high r squared mean a good model?
CHECK SCATTER FIRST..
Make sure model “FITS” the data.
You should check your scatterplot and residuals plot to make sure model is appropriate and no outliers present… then it means something
So YES, but after you check the resids.
How do you interpret slope?
For an increase of 1 [unit of x] there is an (increase/decrease) of [SLOPE] [units of y].
You can write “SLOPE UNITS Y/ ONE UNITS X” to help
How do you interpret slope EQUATION?
rSy/Sx
for each increase of 1 st dev in x direction,
you go r st dev in y direction.
2st dev in x, you go 2r st. dev in y.
3st dev in x, you go 3r st. dev in y.
what does influential mean?
It impacts the SLOPE.
It means that the point, when added or removed to data, will influence the SLOPE.
Generally these are outliers in the x direction. Far left or right.
if you switch x and y does r change?
NO. The strength stays the same.
Can you predict an X by using a Y?
NOT WITH THE SAME EQUATION!
BE CAREFUL!! Don’t just solve for x…
You have to change the entire equation and start from scratch.
(run LinReg L2, L1)
Interpret r squared
r squared % of variability in y can be explained by the model with x. The rest is in residuals…
If there is a crazy outlier, what can you do?
Run the analysis with and without the outlier and write about both.
how do you interpret y intercept?
The model predicts that if there were no [x stuff] this is how much [y stuff] you’d have
First step in interpreting slope
Write “slope units y over 1 unit x” and look at it.
How do you get equation from computer output?
variable coeff indep: age
constant 7.2
Height 3.5
For this case:
age = 7.2 + 3.5 (height)
If you switch x and y will slope change?
YES (but not just reciprocal)
slope is rsy/sx ,
to get new slope you can use shortcut:
r2/old slope
(reciprocal times r2)
Computer ouput:
What does “constant” mean?
It is the y intercept
Computer Output:
What is “S”
The average, or typical residual..
Standard deviation of the residuals
typical distance from actual value to the model’s prediction.
About how far off your prediction is likely to be.
How do you undo squares or cubes?
like if you have x2= stuff
or x3= stuff
^ 1/2 or ^ 1/3
(raise stuff to these powers to get x)
How do you undo sqrt when solving?
like
sqrt(x) = stuff
^2
(raise stuff to power of 2 to get x)
How do you undo a log when solving?
log x = stuff
or
log x = m
10^ stuff
10stuff
that will get you x
or
x= 10m
How can you straighten data?
Do stuff to the y (square it, root it, log it, etc) and recheck the plot. Remember to put the transformation into your equation. Example Sqrt y = 4.33 - 2.03 x
How do you undo an ln (natural log) when solving?
ex: ln x = stuff
or: ln x = m
estuff
or
em
if you mult or divide the x’s or y’s (shift/scale) does r change?
no. the strength remains the same. (If you log or square it, it will change, but just adding or multiplying won’t change it)
What other regressions does your calculator do?
Quadreg, cubicreg, lnreg, etc.
just be careful when substituting while writing the equation given.
Height and weight has an r value of 0.7. You would expect a person with a height that is 2 st. dev above the mean in height to have a weight that is only___St. Dev above the mean weight.
only 1.4 S.D above the mean for weight.
(for each SD in the x direction you change r SD in the y direction)
How do you get equation from computer output?
variable coeff indep: doc
constant 0.005
genet - 0.233
doc = 0.005 - 0.233 (genet)
How do you undo and exponent?
Example
stuffx= other
ax=b
log other / log stuff
that gives you x
or
x = (log b) / (log a)