STAT MOD 3: Chapter 4 Flashcards
Describing Bivariate Numerical Data
What is a scatter plot? What are the two variables?
A point represents combination of two measurements for an individual observation
- for bivariate, numeric
- Explanatory variable (x axis) and response/dependent variable (y axis)
What is the form of a relationship? What are the two types of relationship?
Average pattern or form of the scatter plot
Linear: pattern of relationship resembles a straight line
Curved: pattern of relationship resembles a curve
What is direction? What are the two types of direction?
IF linear, you can determine direction of relationship
Positive association: when values of one variable increases as value of the variable increases (move in same direction)
Negative association: when values of one variable decrease as value of the other variable increase (move in opposite directions)
What is strength of a relationship?
when our points follow a pattern (linear or curved) without a lot of scatter
What is correlation coefficient (r)?
numerical objective measure that indicates
- strength
- direction
of linear relationship between two numeric variables
can range between -1 and 1
How do you interpret correlation coefficient?
Sign of (r) indicates the strength and direction of relationship
- can range between -1 and 1
Between -0.5 and 0.5 = weak
Between -0.8 and -0.5/0.5 and 0.8 = moderate
Between -1 and -0.8/0.8 and 1 = strong
Is the correlation between x and y different from correlation between y and x?
The correlation between x and y is the same as the correlation between y and x
What kind of variables does correlation require?
Correlation requires that both variables be quantitative
(cannot compute a correlation between two categorical variables or categorical variable and a quantitative variable)
Does correlation change when doing transformations/conversions between units?
Correlation does not change when doing transformations or conversions between units.
This happens because all observations are standardized in the calculation of correlation.
What is the unit of correlation?
The correlation (r) has no unit of measurement—it is just a number
Interpreting correlation:
What does positive or negative r indicate?
Positive (r) indicates positive association
negative (r) indicates negative association
What is the range of correlation?
(r) is always between -1 and 1
- values of r near 0 indicate little/no linear association
- values of r close to -1/1 indicate strong linear association
- r = -1/1 show that points fall exactly on straight line
What does correlation only measure?
Correlation only measures the strength of a linear relationship
- curved relationships have a correlation of zero
Is correlation affected by outliers?
Correlation is a non-resistant measure (affected by outliers)
What are potential reasons for observed association between explanatory and response variable?
1) Causation (best way to establish is through randomized experiment)
2) Confounding variable (there may be causation, but confounding variables make causation hard to prove)
3) Lurking variable (no causation; association can be explained by other variables affecting both explanatory/response)
4) Response variable is causing change in explanatory variable
What is a confounding variable?
variable that is not main concern of study but may be partially responsible for the observed results
- causation but tied up with confounding variables
What is a lurking variable?
variables that affect both x and y variable, causing us to see an association
- no causation
- other variables affect both explanatory/response
What is a regression line?
A straight line that describes how values of a response variable (y) are related on average to values of explanatory variable (x)
- used to estimate average value of y at specific value of x
- used to predict unknown value of y for an individual, given individual’s x value
What are components of regression line/equation?
y hat = a + bx
What is a?
What is b?
slope
- the amount that the y variable changes when x increases by one unit
What does it mean if the slope of regression linen is positive? What if it’s negative?
- When slope is positive, direction of relationship is positive (y increases as x increases)
- When slope is negative, direction of relationship is negative (y decreases as x increases
What is residual?
observed y value - predicted y value (using regression equation)
-
Interpreting residuals:
What does a positive residual mean?
data point falls above regression line
prediction was an underestimate of the observed value