AP Stat Ch 3 and 12.2 Flashcards
Explanatory variable
Attempts to explain or influence changes in a response variable.
Independent variable. X axis
Response variable
Measures an outcome of a study.
Dependent variable
Which is explanatory variable:
- Scuba diving: depth and visibility
- World population vs. year
- Amount of rain vs. crop growth
- Height vs. GPA
- Depth
- Year
- Amount of rain
- No association
Scatter plot
The most effective way to display the relation between two quantitative variables measured on the same individuals.
Tips for drawing scatterplot by hand
- Plot explanatory variable on x axis
- Label both axes
- Scale the axes with uniform intervals
- Make plot large enough to see details
Four major features in interpreting scatter plots
Direction
Form
Scatter
Outliers
Direction
A pattern from the upper left to the lower right is said to have a negative direction. A pattern from lower left to upper right has a positive direction.
Form
Approx linear, curved, exponential…
Scatter
Strength of relationship.
Strong to weak on a scale
Positive vs negative association
Positive when above average values of one tend to accompany above average values of other. Slope is positive.
Negative when above average with one accompanies below average of the other variable. Negative slope
Correlation
The correlation, r, is a common measure used to numerically asses the association between two quantitative variables. Measures the direction and strength of a linear relationship. On a scale of -1 to 1.
Indicates direction by its sign and strength by how far r moves away from 0.
Obtained from stat menu, Calc, 8.
Don’t need to calculate by hand, but it is sum of the standard deviations of x times the sum of the standard deviations of y divided by n-1
What happens as r gets closer to 0
Weaker
Stronger as further from zero
Properties of r
- No units
- Doesn’t depend on which variable is x and y as product of scores of x times y is same as y times x.
- Correlation requires both variables to be quantitative
- -1<= r => 1
When r is greater than zero, relationship is positive.
When r less than zero, relationship negative - r only =1 or -1 when the data is perfectly linear.
- Value of r is a measure of the strength of a linear relationship only. Measures how closely the data fall into a straight line. R value near zero doesn’t indicate no relationshop, but rather, no linear relation.
- Not resistant.
Don’t confuse correlation with causation
Just because number of students taking stat has increased and murders are down, doesn’t mean that one causes the other
Regression line
A line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x.
Stat Calc 8
LRSL- least squares regression line
Y hat = a + bx
LSRL form
Y hat = a + bx
A is the y intercept
B is the slope
Y hat is used as a prediction of the model
When interpreting the slope, always mention according to the Model, as x increases by one, the y variable is expected to increase by b.
Extrapolation
Predict based on ref line outside data domain. DO NOT EXTRAOLATE EVER
Residuals
The difference between an observed value of the response variable and the value predicted by the regression line. The vertical distance from the point to the line.
Y minus y hat
What does it mean when residual is pos/neg?
When pos, y is greater than y hat. So value above prediction, above LSRL
when neg, y is less than y hat. So value below prediction, below LSRL.
Important questions to consider with LSRL
- Is linear model really appropriate, or would curved model be better
- Are there any unusual aspects of the data set?
- If we make predictions, how accurate?
Residual plot
Scatterplot of the regression residuals against the explanatory variable. If there is a pattern then it shows that linear is not the best model
If an observation has a positive residual, then…
Y minus y hat is positive. So y is above the expected value. So y is above the line. The prediction is too low.
If an observation has a negative residual, then…
Y minus y hat is negative, so y hat is larger. This means that the predicted value is too high. We are below the predicted value
Only way to tell you if a linear model is the best choice…
RESIDUAL PLOT PATTERN!
Standard deviation about least squares regression line
Shows you how close the observation is to the line. The approximate size of a typical or average prediction error (residual).
Represented by S.
If S= 4 UNITS (same as y), shows that the typical deviation from the expected value is 4 units.
Coefficient of determination (r squared)
r squared is a measure of the proportion of variability in the y variable that can be explained by the linear relationship between x and y.
Also tells u how well the LSRL is at predicting values of y.
Say that r squared = .74 (no units).
74% of the variability in army attririgbted to years of experience. Or that the LSRL is 74% better at predicting y than using the mean y value every time.
How to get r from r squared
SQRT and then positive if slope is pos and negative if slope is neg.
Correlation and regression describe only…
LINEAR RELATIONSHIPS
Is correlation resistant?
No!
Outlier
Observation that lies outside the overall pattern of the other observations.
Influential point
An observation is influential if removing it would markedly change the result of the calculation. Points that are outliers in the direction scatter plot are oftne influential.
Influential points and outlier relationship
An influential points is always an outlier.
An outlier not always influential.
Lurking variable
Variable not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among those variables.
Summary of chapter 3:
- Graph data
- Generate LSRL
- R tells us how well the data fits the line–correlation, desire whether this is the most appropriate model
- Residual plot tells us whether a linear model is appropriate.
- S tells us our average error if/when we use LSRL to predict Y
- R squared tells us how much better our LSRL is at predicting our y value than using the mean y value every time. Also explains the percent of variability in y as x changes
Exponential models
Take ln or log of just the y variable
Then solve for y hat.
a * b^x
Base of exponential function
In form a*b^x
B will be positive.
1+/- r
For example, if b=1.057, then y hat is increasing 5.7% for every one increase in x
If b= .79, then y hat is decreasing 21% for every one increase in x
When b is less than one, decrease. Greater is increasing
Power model
Take log or ln of both x and y
a*x^b
As x increases by 1, y will increase by 1/b