Topic — Simple Linear Regression Analysis Flashcards
Hypothesis testing in regression
Testing the statistical significance of the relationship.
What are we testing
If slope coefficient is different from 0.
0=changes in x have no effect on y
≉0 x has a statistical influence on y
Null and alternate hypothesis in regression
H₀=β=0
H₁=β≉0
0=x has no influence on y
≉0 X has a statistical influence on Y
True population regression line equation
Y=α+βx+e
3 ways of hypothesis testing in regression
Confidence intervals
Or
Null/hypothesis test
Or P values
CI formula for hypothesis testing in regression
b±t(crit)se(b)
se (b) = standard error
T crit comes from n-2 distribution, AS WE ARE TESTING 2 VARIABLES (X AND Y)
Method 2: null/hypothesis
Set null/alternate
Significance level
Use t statistic…. BUT REMEMBER β=0 as what we are testing
T=b-β/se(b) ~t(n-2) (so basically just b/se(b)
Reject if |t|>t(cv)
Method 3: p values
If p value is less than significance level, we reject the null. E.g if significance value is 0.05
Saves us finding critical values from distribution tables (t)
Goodness of fit
How tightly the data points are scattered around the regression line.
Measure of goodness of fit
R²
0-1 , closer to 1=better fit. Doesn’t show whether pos/neg
R² formula
SSR/SSTotal=1-SSE/SSToal
SSR=regression sum of squares
SSE=residual (error) sum of squares
SSTotal=total sum of squares
SSTotal formula
Σ(y-y bar)²
y=actual figure
y bar=predicted y (ON THE REGRESSION LINE)
SSR formula
Σ=(y^ - y bar)²
SSE formula
Σ(y-y^)²
Note: for both test statistics of correlation and regression, both critical regions are found by T(n-2)!!!
Correlation (0=no correlation, ≉0 means there is a correlation)
T=r√n-2
/
√1-r²
Regression (0=x has no statistical influence on y, ≉0 means x HAS a statistical influence on y)
T=b+β
/se(b)