Chapter 10 - Relationships between Numeric Variables: Regression and Correlation Flashcards
What to look for in a scatter plot:
Trend (pattern), scatter, outliers, strength of the relationship, association, groupings
Association
A pattern that connects two (or more) variables. This pattern would be unlikely to be generated by purely chance. Conversely there is no relationship when learning the value of one variable would tell you nothing new about the likely value of the other.
Correlation
The strength and direction of the relationship between two numeric variables.
Positive correlation
The values of one variable tend to increase as the values of the other variable increase
Negative correlation
The values of one variable tend to decrease as the values of the other variable increase.
Correlation coefficient
A number between -1 and 1 calculated so that the number represents the strength and direction of the linear relationship between two numeric variables.
A correlation coefficient of 1
Indicates a perfect linear relationship with positive slope.
A correlation coefficient of -1
Indicates a perfect linear relationship with negative slope
Direction
A descriptor of whether an association between two variables is positive or negative
Explanatory variable
A variable that we want to use to try to explain or predict the behaviour of a response variable, or just to investigate whether this might be possible.
Extrapolate
To estimate the value of one variable based on knowing the value of the other variable, where the known value is outside the range of values of that variable for the data on which the estimation is based.
Least squares regression line
A line used to represent a linear trend between the two numeric variables displayed in a scatter plot where the line is chosen to minimise the sum of the squares of the residuals.
(Simple) linear regression
A procedure used when one numeric variable (explanatory) is used to predict or explain the behaviour of a second numeric variable (response variable) and the overall pattern between the two variables can be represented by a line.
Linear trend
The overall pattern between the two numeric variables displayed in a scatter plot when that pattern can be represented by a line,
Outlier(s)
Value(s) that lie so far away from the bulk of the data that they look odd and make us wonder βis that a mistake?β