Simple linear regression Flashcards
what is linear regression?
simple approach to supervised learning. it is used to model the relationship between several input variables (x) and a continuous response variable (y)
assumed model?
y=β0+β1 X+e
distance between observed and predicted values?
residual e= Yi-predicted Yi= Yi-(β0+β1Xi)
residual sum squares (RSS)?
total magnitude of deviations from all squared residuals of data points (sum) (residual may be positive or negative thus square)
to find β0 and β1, use estimation of least squares
first order derivatives of RSS w/ respect to β0 and β1 separately, set to 0
predicated β1?
cov(x,y)/var(x)
predicted β0?
mean(y) - β1*mean(x)
what is standard error(SE) an estimator for?
how the estimates vary under repeated sampling
hypothesis testing for relationship between x and y?
H0: β1=0 H1: β1!=0
t-statistics(to test null hypothesis)
t=(β1-0)/SE(β1). n-2 degrees of freedom
critical value and confidence interval when n is large?
1.96(as n increases, t-dist gets closer to normal dist) and 95%(as n increases, t-dist gets closer to normal dist)
p-value definition?
probability of observing any value >= |t|
calculate confidence interval
[β1+-1.96*SE(β1)]
when to reject null hypothesis?
when |t| both larger than 1.96, we can reject H0 with 95% confidence
Residual standard error means?
RSE measures lack of fit, if RSE=3.259, on avg, deviation of Y from regression line is 3.259 points