Chapter 4 Flashcards
data in which two variables are measured on each individual
bivariate data
graph that shows the relationship between two quantitative variables measured on the same individual - represented by points
scatter diagram
when above-average values of one variable are associated with above-average values of the other variable (or below/below)
positive association
measure of the strength and direction of the linear relation between two quantitative variables - not resistant to outliers
linear correlation coefficient
relation that creates a U shape on the scatter diagram
quadratic relation
makes residuals as small as possible
least-squares regression line
difference between predicted value and actual value
residual
point where the line intersects the vertical axis
y-intercept
predicting outside the range of the x-values in sample data - can result in bad predictions, not recommended
extrapolation
proportion of variation in the response variable that is explained by the least-squares regression line - number between 1 and 0 (0 means no explanatory value, 1 means 100% explanation)
coefficient of determination (R2)
differences between predicted value and actual value as a result of other incidental factors and random error
deviations
deviation due to difference between observed value and mean value of response variable (unexplained + explained)
total deviation
deviation due to difference between predicted value and mean value of response variable (below line)
explained deviation
deviation due to difference between observed value and predicted value (above line)
unexplained deviation
scatter plot where the explanatory variable is plotted on the horizontal axis and the corresponding residual is on the vertical axis - if it shows a discernable pattern, variables may not be linearly related
residual plot
constant error variance - if a residual plot shows residuals increasing/decreasing as the explanatory variable increases, linear model is violated
homoscedasticity
an observation that significantly affects the least-squares regression line’s slope / y-intercept and the correlation coefficient
influential observation
AKA two-way table - shows relationship between two qualitative variables (row variable and column variable)
contingency table
boxes in a contingency table that represent the intersections of the row variable and column variable values
cell
frequency / relative frequency distribution of either the row or column variable in the contingency table
marginal distribution
lists the relative frequency of each category of the response variable, given a specific value of the explanatory variable in the contingency table
conditional distribution
when an association between two variables inverts or goes away when a third variable is introduced to the analysis
Simpson’s paradox