Reading Quiz 3 Flashcards
explanatory variable
independent
x
can still use if explanatory doesn’t cause response
response variable
dependent
y
scatterplots analyzed according to
direction
form
strength of relationship
outliers
direction
positive association, negative association, neither
form
clusters of points, linear pattern, etc
need to say if linear or not
strength of the relationship
how close to a straight line do the points appear to be
outliers
points that don’t follow the general pattern of the data
correlation coefficient
measures direction and strength of linear relationship between two quantitative variables
r
notes about correlation coefficient number
always -1
what does correlation coefficient measure
existence and strength of linear relationships
if r=0 not a linear relationship but other relationship could exist
is formula for correlation coefficient sensitive to outliers
extremely sensitive
does correlation coefficient have units
no
does correlation coefficient changed based on explanatory and response variable
no
it is the same regardless of which variable you consider to be the explanatory and which you consider to be the response
extrapolation
use of a regression line for predication outside the range of values of the explanatory variable x used to obtain the line
often not accurate
bad
least squares regression line
line that makes the sum of the squares of the residuals as small as possible
sum of the squares of the residuals
error sum of the squares
formula for least squares regression line
yhat = a + bx b = slope = r(sy/sx) a = y-intercept = mean of y — mean of x (slope)
what point is on every regression line
(mean of x, mean of y)
residual
y — yhat
observed value of y minus predicted value of y
positive = above regression line
negative = below regression line
coefficient of determination
r^2
measures variation in y that is explained by y’s linear association with x
higher means better LSRL fits
sentence for coefficient of determination
this means that X% of the variation in Y (y) is explained by the linear relationship with X (x)
residual plot
graphs the residuals on the vertical axis and either explanatory, response, or predicted response values on the horizontal axis
residuals from a LSRL always have a mean of
0
how data fits residual plots
good if points scattered evenly and closely to horizontal axis, no clear pattern
bad if plot is curved (not linear)
bad if values fan out (outliers, not as accurate on fanned side)
outlier
observation that lies outside the overall pattern of the other observations in a scatterplot
can be outlier in x, y, or both directions
influential
observation is influential if removing it would markedly change the position of the regression line
points that are outliers in the x direction are often influential
are correlation coefficient (r) and LSRL resistant
no
scatterplot
displays relationship between two quantitative variables for the same individual
least squares regression line
line of best fit
only used when one variable helps explain or predict the other
can be used to predict a y value given an x
minimizes residual
clusters
when describing a scatterplot and the values fall in two or more groups separated by gaps
for any given x value the more widely varying the y values are means the relationship is
less strong
three other scatterplot guidelines
- make the intervals uniform
- label both axes
- choose a scale that makes graph big enough
how to add categorical variable into graph
use different colors or symbols
which measures of center and spread do you use with the correlation coefficient
mean and standard deviation
true or false: in a regression line you get the same numbers (slopes and intercepts) no matter which variable is considered explanatory and which is considered response
false
change per unit of x v y different than change per unit of y v x
for LSRL sum of what squares being minimized?
squares of errors for each data point (residuals)
SSE
sum of the squares of the deviations of the actual y values from the predicted y values (residuals)
true or false: the slope of the regression line tells how many unstandardized units the predicted value of y changes for each unstandardized unit change in x
true
true or false: the correlation coefficient tells how many standard deviations the predicted y changes for each standard deviation change in x
true
true or false: if both of two variables x and y are standardized so the the sd of both is 1 then the slope of the regression line and the correlation coefficient are equal
true
true or false: LSRL is line that minimizes square of residuals
true
why can’t all values on residual plot be positive
mean of least squares residuals is zero so if have positive must have negative too
does influential point necessarily have large residual
no