Unit 2 Flashcards
What is a scatterplot used for
Immediate visual impression of a possible relationship between 2 variables.
What is the correlation coefficient
Used as a quantitative value of strength and direction of a linear relationship.
Inequality of value of correlation coefficient and symbol
-1 < r < 1
What is r^2
Coefficient of determination - % of variation in the resp var that is explained by the explanatory var
How is r^2 represented
As a %
what does 100% r^2 mean
Perfect fit
all variation in x
How to describe a scatter plot and eg
Direction
Form
Strength
Strong positive linear association
What are the boundaries of r (what value of r is strong, etc)
r > 0.8 = strong
0.7 < r < 0.8 = moderately strong
r < 0.7 = weak
What is a residual
Difference between an observed and predicted value
= actual - predicted
Sum and mean of residuals is always equal to
0
Line of regression eqn and what each variable is
y hat = a + bx
a = y intercept b = slope
How to calculate B
what does each variable mean
r * sy/sx
r = correlation coeff sy = SD of y sx = SD of x
How to check if linear model is a good fit
r^2 should be high
Residual plot shouldn’t show any pattern
How and whyto transform a graph
Use log or ln on one of the variables, may allow for a linear model to be used
How to assess the effectiveness of transformation
Checking if randomness in residual plot has increased
Checking if r^2 value has increased
What is the LSRL
Full form
Least squared regression line
Linear model that minimizes the sum of squared
If residuals is negative then
if residuals is positive then
Negative - overestimated the response var
Positive - underestimated the response var
what is the response var
The dependent var
What is the explanatory var
the independent var
How to use residual plot to assess good fit (criteria)
Random
Centered at 0
No clear patterns
What does a residual plot do
Helps decide if you should use a linear model or should consider others.
What is extrapolation
why is it bad
Prediction that was made outside the interval of current data x values.
Trends may not continue at this next X value so it may not be accurate.
How to interpret slope
For every ‘x units’ of ‘independent var’, model predict an average increase of ‘slope value’ of ‘dependent var units’
How to interpret y intercept
When ‘explanatory var’ = 0, model predicts that the ‘resp value’ woudl be ‘y int value’
How to interpret r^2
‘r^2’ percent of var in ‘resp var’ can be explained by the linear relationship with the ‘explanatory var’
If r^2 is closer to 0 then
if r^2 is closer to 1 then
r^2 closer to 0 = weak relationship
r^2 closer to 1 = strong relationship.
If slope is positive then r is
Positive
What is a high leverage point
if removed then what
what does it change
Points with unusually large or small x values far away from xbar (mean of x values)
Cause a substantial shift in the model being used
Slope or y int could be changed.
What does an outlier affect
R
R^2
Strength of the model
What is an influential point
3 types
what do they affect
Points that when removed change the slope, y int and/or correlation coeff substantially
Outliers (corr)
High lvg pt (change slope/y int)
both of the above