DECK 5: UNIT 2 - REGRESSION Flashcards
How do you describe a scatterplot?
DIRECTION
FORM
STRENGTH
and STRANGE
describe a scatterplot’s strength?
give the r value (if straight),
or say…
“tightly packed… loosely packed”
how do you describe direction?
positive or negative
how do you describe form of a scatterplot?
straight or curved?
Diff between association or correlation?
association is talking about a relationship.
If you see a pattern in the scatterplot, there is an association.
Correlation is an actual calculated number (two quantitative variables)
Why is it called the “least squares regression line?”
the LSRL?
Because, after you find the mean-mean point, you fix the line so that it minimizes the squared vertical distancesto that line from each point.
It minimizes the squared residuals, the least squares….
How do you find outliers in regression?
they don’t follow the “flow”
(pinky trick, cover with you pinky.. Then uncover.. Does it follow the flow?)
What is homoscedasticity?
equal scatter along the regression line
What values can r be?
from -1 to +1
(r near 0 is WEAK)
What is the line that you plot?
IT IS A MODEL!
It is the LSRL and it is the model we are talking about
what is a linear model?
It is an equation you can use or a line of a graph,
but it is just a model that says what kind of happens,
and can be used to ESTIMATE WHAT MIGHT HAPPEN
What does r2 tell us?
(r-squared)
It tells us the percent of variablility of y that is explained by the model with x.
What if a scatterplot goes straight across horizontally?
NO ASSOCIATION.
That would be like height and IQ, they are independent so each height has about the same IQ.
Does r2 tell direction?
NO
r2 is always positive, so you can’t use it to see if the relationship is negative.
Can there be a correlation between grade and music preference?
No, music preference is categorical.
There is an association, however.
Does the regression line (LSRL) go through a lot of points?
No, usually it goes through NONE!
It just goes through the center of the cloud of points.
Does a high r value mean anything?
(can it look strong, but not be?)
Sure. It can. It tells you strength of LINEAR relationship.
BUT
CHECK THE SCATTER. One outlier or typo can make it look STRONG.
what is the LSRL
the “least squares regression line”
that line you plot
OR
That equation
What does r tell us?
The direction (+/-) and how strong a LINEAR relationship is between two QUANTITATIVE variables… (when linear)
which is response?
y variable,
the Vertical axis..
It “responds” to the x
Lurking variable: Why are there more ice cream sales on days that there are more surfing accidents? Is the ice cream putting surfers at risk?
The WEATHER is the lurking variable.
When it is a nice day, more surfers and more ice creams are sold.
So, the WEATHER causes both to go up and down together.
Give example of incorrectly using the word “correlation”
“there is a correlation between gender and video game playing”
This person should say “association.”
You can’t say correlation because gender is categorical.
what is b1 and bo ?
b1 is the SLOPE,
bo is the intercept.
What’s wrong: Age and height have a correlation of 2.7
WRONG.
Correlation must be between 1 and -1
What should we look for in resid plot?
Curve or pattern.
Also, it should have equalish scatter from left to right
It should look RANDOM
What if the scatterplot is curved?
Either straighten the scatter and fit a line,
or keep it and fit a curve
Try quadreg, cubicreg, lnreg, logreg and check the graph and the r.
What is extrapolation?
Making predictions outside of the x values you have.
does correlation mean causation?
NO WAY DUDE