Chapter 13- Correlation and Regression Flashcards
correlation
a statistical procedure used to describe the strength and direction of the linear relationship between two factors.
Linear regression
also called regression, is a statistical procedure used to determine the equation of a regression line to a set of data points and to determine the extent to which the regression equation can be used to predict values of one factor, given known values of a second factor in a population.
scatter plot
also called a scatter gram, is a graphical display of discrete data points (x, y) used to summarize the relationship between two variables.
Data points
the x- and y-coordinates for each plot in a scatter plot.
correlation coefficient (r)
used to measure the strength and direction of the linear relationship, or correlation, between two factors. The value of r ranges from −1.0 to +1.0.
positive correlation
(0 < r ≤ +1.0) is a positive value of r that indicates that the values of two factors change in the same direction: As the values of one factor increase, the values of the second factor also increase; as the values of one factor decrease, the values of the second factor also decrease.
negative correlation
(–1.0 ≤ r < 0) is a negative value of r that indicates that the values of two factors change in different directions, meaning that as the values of one factor increase, the values of the second factor decrease.
regression line
the best-fitting straight line to a set of data points. A best-fitting line is the line that minimizes the distance of all data points that fall from it.
Pearson correlation coefficient (r),
also called the Pearson product-moment correlation coefficient, is a measure of the direction and strength of the linear relationship of two factors in which the data for both factors are measured on an interval or ratio scale of measurement.
sum of products (SP)
the sum of squares for two factors, X and Y, which are also represented as SSXY. SP is the numerator for the Pearson correlation formula. To compute SP, we multiply the deviation of each X value by the deviation of each Y value.
coefficient of determination (r2 or R2)
a formula that is mathematically equivalent to eta-squared and is used to measure the proportion of variance of one factor (Y) that can be explained by known values of a second factor (X).
Homoscedasticity
the assumption that there is an equal (“homo”) variance or scatter (“scedasticity”) of data points dispersed along the regression line.
Linearity
the assumption that the best way to describe a pattern of data is using a straight line.
Reverse causality
a problem that arises when the causality between two factors can be in either direction.
confound variable
or third variable, is an unanticipated variable not accounted for in a research study that could be causing or associated with observed changes in one or more measured variables.