Module 6: Correlation and Regression Flashcards
Simpson’s Paradox
A counterintuitive situation in which a trend in different groups of data disappears or reverses when the groups are combined.
coordinate plane
A tool for graphing consisting of a horizontal x-axis and a vertical y-axis.
simple linear regression
The prediction of one response variable’s value from one or more explanatory variables’ value when there is a linear relationship between the two variables.
sampling frame
The list of all people or things that may be included in the statistical study.
significance level
The p-value cutoff for statistical significance. Any p-value below the set significance level is considered statistically significant.
cluster sample
Similar to stratified sample, but researchers select entire chunks or clusters of the population to obtain the study sample.
observational study
The researcher observes if there is an association between variables. There is no treatment or control group.
significant difference
A measurable difference between two groups or samples that reflects a real difference, rather than the difference being by chance.
correlation
An observed relationship between two quantitative variables. While this is most commonly a linear relationship, it does not need to be. Note that observing a relationship does NOT imply that there is a meaningful causal link between the variables.
p-value
The probability that a result was caused by chance.
regression analysis
A statistical analysis tool that quantifies the relationship between a response variable and one or more explanatory variables.
sampling method
The technique used to select people within the sampling frame.
causal relationship
A relationship between two variables that can be classified as cause-and-effect.
representative sample
A subset of the population with similar characteristics to the entire population.
population
All subjects in the study which have the characteristics being evaluated.
slope-intercept form
A common format for the equation of a line: y = mx + b, where m is the slope and b is the y-intercept.
voluntary sample
Researchers invite everyone in the sampling frame to participate. Individuals who voluntarily respond comprise the study sample.
scatterplot
A graph that uses dots on a coordinate plane to show the relationship between variables.
lurking variable
A variable that is not included in an analysis but that is related to two (or more) other associated variables which were analyzed.
hypothesis test
A statistical test that tells us whether a result is significant.
sample
The subset of the study population that is being studied.
linear interpolation
Estimation using the linear regression equation in between known data points.
correlation coefficient
A measure of the linear relationship between two attributes. The numerical value demonstrates how closely the attributes vary together. Correlation coefficients near -1 and +1 have strong linear correlation, while a correlation coefficient near 0 has weak (or no) linear correlation.
least squares
A technique for finding the regression line.
positive correlation
A linear relationship between two quantitative variables in which the dependent variable increases as the independent variable increases.
regression line
The line of best fit to show the relationship between variables, the one that minimizes distance from each data point to the line.
association
A pattern or relationship between two variables.
experimental study
The researcher applies a treatment to one group and no treatment (or placebo) to a control group, to determine if there is causation between variables.
extrapolate
Using information from a data set to make predictions about data outside of the original set.
causation
A relationship of cause and effect between two or more variables.
regression equation
An equation used to model the relationship between the response and explanatory variables in a regression.
linear extrapolation
Estimation using the linear regression equation is made outside known data points.
degree
The largest exponent in a mathematical expression or equation.
statistically significant
The presumption that a given result or relationship is caused by more than just random chance.
negative correlation
A linear relationship between two quantitative variables in which the dependent variable increases as the independent variable decreases.