UNIT 2 - REGRESSION Flashcards
How do you describe a scatterplot?
DIRECTION
FORM
STRENGTH
and STRANGE
describe a scatterplot’s strength?
give the r value (if straight),
or say…
“tightly packed… loosely packed”
how do you describe direction?
positive or negative
how do you describe form of a scatterplot?
straight or curved?
What is wrong with “for each additional hour studied, a person’s test score will go up by five points?”
This implies CAUSATION. this is just a correlation. You should say “on average, students with an additional hour of study time tend to score five points higher” These are all different students, and we don’t know if it CAUSED it. We can only show causation with an EXPERIMENT. This is a study. If students were randomly assigned hours of study time, then you could discuss causation because that would be an experiment.
Diff between association or correlation?
association is talking about a relationship.
If you see a pattern in the scatterplot, there is an association.
Correlation is an actual calculated number, r, between two quantitative variables.
Why is it called the “least squares regression line?”
the LSRL?
Because, after you find the mean-mean point, you fix the line so that it minimizes the squared vertical distancesto that line from each point.
It minimizes the squared residuals, the least squares….
How do you find outliers in regression?
they don’t follow the “flow”
(pinky trick, cover with you pinky.. Then uncover.. Does it follow the flow?)
What is homoscedasticity?
equal scatter along the regression line
What values can r be?
from -1 to +1
(r near 0 is WEAK)
What is the line that you plot?
IT IS A MODEL!
It is the LSRL and it is the model we are talking about
what is a linear model?
It is an equation you can use or a line of a graph,
but it is just a model that says what kind of happens,
and can be used to ESTIMATE WHAT MIGHT HAPPEN
What does r2 tell us?
(r-squared)
It tells us the percent of variablility of y that is explained by the model with x.
If study time vs test score equation is
predicted score = 40 + 15 (study time).
How would you interpret the slope?
The model finds that on average, for each additional hour of study time a student has, they tend to score about 15 points more.
If study time vs test score equation is
predicted score = 40 + 15 (study time).
How would you interpret y intercept?
The model predicts that a person who studies 0 hours would score around 40 points.
If a linear association between study time and test score has an
r2 =0.85,
How do you interpret this?
( r2 is a.k.a the coefficient of determination)
85% of the variability in test score can be explained by study time with the model.
What if a scatterplot goes straight across horizontally?
NO ASSOCIATION.
That would be like height and IQ, they are independent so each height has about the same IQ.
What is the “coefficient of determination?”
A fancy name for r2
Does r2 tell direction?
NO
r2 is always positive, so you can’t use it to see if the relationship is negative.
Can there be a correlation between grade and music preference?
No, music preference is categorical.
There is an association, however.
Does the regression line (LSRL) go through a lot of points?
No, usually it goes through NONE!
It just goes through the center of the cloud of points.
If r= -0.9 is there a strong, negative linear relationship?
Maybe not.
CHECK THE SCATTER. One outlier or typo can make the r value look STRONG.
what is the LSRL
the “least squares regression line”
that line you plot
OR
That equation
What does r tell us?
(r is a.k.a the correlation coefficient)
The direction (+/-) and how strong a LINEAR relationship is between two QUANTITATIVE variables… (when linear)
What is the “correlation coefficient?”
The r value
which is response?
y variable,
the Vertical axis..
It “responds” to the x
Lurking variable: Why are there more ice cream sales on days that there are more surfing accidents? Is the ice cream putting surfers at risk? are people buying ice cream because they got hurt?
The WEATHER is the lurking variable.
When it is a nice day, more surfers and more ice creams are sold.
So, the WEATHER causes both to go up and down together.
Give example of incorrectly using the word “correlation”
“there is a correlation between gender and video game playing”
This person should say “association.”
You can’t say correlation because gender is categorical.