Linear Regression Flashcards
Regression Analysis uses a _______model to predict a ______variable (dv) by using one or more _______variables (iv).
Statistical
Response
Predictor
In regression analysis, β0 and β1 are called_______
parameters
What are the four steps of hypothesis testing?
Step 1:
one-sided: H0<μ Ha≥μ(no linear association between x and y – not useful for predicting y)
two-sided: H0=μ Ha≠μ
Step 2:
t=(x ̅-μ0)/(s⁄ √n) with df=n-1
t*=b1/s{b}
Step 3: t {1- α, n-1} OR t {1- α/2, n-1}
Step 4: If t ≥ +crit val or ≤ -crit val reject H0
What is the simple linear regression model?
Y=β0+β1X1+ε
In linear regression, E(ε)=
0
In linear regression, σ2 {ε}=
σ2
In linear regression, ε’s are/are not correlated and have covariance of ___.
ε’s are uncorrelated and have covariance of 0
Least Squares Estimates of betas _____ the sum
n
∑ [y<sub>1</sub>-(β<sub>0</sub>-β<sub>1</sub>x<sub>i</sub>)]<sup>2</sup> (i=1)
minimize
Interpretation of β1
Y=β0+β1X1+ε
For each increase in x, there is an increase/decrease in y.
(e.g., For each add’l hour a student watches tv, he loses .2 GPA points)
Interpretation of β0
Y=β0+β1X1+ε
The mean when x=0
(e.g., On average, first year students who don’t watch tv have a GPA of 3.9)
y ̂ is the ____ regression line.
estimated
b1 and b0 are estimates for
β1 and β0
What is the equation for b1
(ssxy)/(ssxx)
What is the equation for b0
y ̅ - b1x ̅
What is the equation for SSxx
All of the following equations are equal
∑(xi - x ̅ )2
(∑x i2) - n(x ̅ )2
(n-1) sx2
SSxx must be positive/negative.
positive
What is the equation for SSxy
∑(xi - x ̅ ) (y - y ̅ )
(∑xiyi) - n(xy̅)
When creating a table for an estimated regression line, which 5 columns should you include?
xi | yi | xi2 | yi2 | xiyi
What is the equation for the residual εi
εi = yi-E(yi)
What is the equation for the residual ei
ei = yi - y ̂i
s2 is the ________
sample variance
What is the equation for s2
All of the equations below are equal
(∑(xi-x ̅ )2) / (n-1)
SSE/(n-2)
MSE
s is the ____________
sample standard deviation
What is the equation for s
√MSE
√(SSE/(n-2))
SSE is
The sum of the squared errors
What is the equation for SSE
All of the equations below are equal
∑ei2
∑(yi - y ̂i)2
ssyy - b12ssxx
What does s2=.045 and s= .212 mean?
If the dist of GPA for ppl who watch x hrs of tv is approx. normal, then about 95% of them are expected to have GPAs within 2(.212) units of their simple linear reg model
You should assume ____ for hypothesis testing and confidence intervals
normality
b1 and b0 are _______ for β1 and β0
least squares estimators
Why do you want to have a large range of data?
The more variation you have, the better estimate of the slope you can get..
sampling distribution of __(b1)_need to check this_?
has a t-distribution of n-2,
because we estimate b0 and b1
What does it mean to have a 95% CI?
If we took 100 samples of size xx, we would expect 95% of tem to contain value β1
Interpretation: 95% of all b1’s will fall within this range
What is Interval Estimation?
CI for mean of Y when x=xh
SSTo
the error/variation when not using any model at all; never changes when using a diff model or using new variables; total var around y ̅
SSE
error/variation when using SLR; the variation in y not explained by using x; too high equals too much error
SSR
The error left after fitting the model; the chunk of variation in y explained by using x (we want this to be large)
What are the components of the ANOVA table?
Source of Variation SS df MS
Regression SSR ÷ 1 = MSR
Error SSE _ ÷ _ n-2 = MSE
Total SSTo n-1
What does an F-test for model usefulness tell us
if R2 is signif, but not if it is useful
What are the four steps in conducting an F-test
Step 1:
two-sided: H0:β1=0 Ha: β1 ≠ 0
Step 2: F*=MSR/MSE =SSR/MSE (all are always positive; want F* to be >1)
Step 3: F {1-α, 1, n-2} (*numerator df always 1 in SLR)
Step 4: if F* > F {1-α, 1, n-2}, we reject H0and we have evidence that the SLR model is useful
- ****in SLR (one predictor variable), the t-test for β1=0 is the same as the F-test
- ****In SLR only F* = t*2 √(fcrit) = tcrit