Stats Test 3 / Final Flashcards
What category of relationships is needed for Linear Regression?
Q -> Q
What does Linear Regression not represent?
Steepness of relationship
What are x and y generally?
x = explanatory variable
y = response variable
What is regression notation?
y-hat = a + bx
a = intercept
b = slope
y-hat = predicted y-value (mean of y for given x)
Words for regression
- best-fitting line
- least squares line
- least squares regression line
Linear regression minimizes what?
vertical residual (distance between line and point up and down)
What is the equation for residual error or prediction error
y - y-hat
“Simple” formulas for slope and intercept
b = r (sd-y / sd-x)
a = Y-bar - b*X-bar
What is extrapolation?
use of regression line to estimate mean of y for x far outside x-range of data
Simpson’s Paradox
It is just a bias introduced by failing to account for the lurking variable—an arithmetic phenomenon in the calculus of proportions:
(a + b / c + d) > or < a / c and b / d
Formula for population regression line
µy = α + βx
CONDITIONS OF REGRESSION MODEL:
L I N E:
Linearity: scatterplot should have a linear form
Independence: data come from random samples or a randomized experiment
Normality: no outliers in histogram of residuals
Equal Population Stan. Dev.: no megaphone pattern in scatterplot
Sampling distribution of b
(b - β) / SEb
(T- statistic) with df = n - 2
inference for β
Confidence interval
b ± t * SEb
Test of significance:
(b - β) / SEb
Hypotheses:
H0: β = β0
Ha: β =/= β0
β < β0
β > β0
Properties of r
- Ranges between -1 and 1
- Doesn’t relate to slope, only correlation
- Does not have units of measurement
- Does not change when units of measurement of either one of the variables change
- Makes no distinction between explanatory and response variables
- Heavily influenced by outliers