Linear Regression Flashcards
Assumption of Linear Regression
E(y|x) = f(X) is a linear function
residual independent normal distribution with 0 mean and constant variance
residual independent with X
X no multicollinearity
number of sample more than number of features
variability in X is positive
no auto-correlation in residual
http://r-statistics.co/Assumptions-of-Linear-Regression.html
Estimate of Linear Regression parameters
OLS: minimize the Residual Sum of Squares
Normal equation: hat(beta) = (X^T X)^-1 X^T y
Confidence interval of parameters
hat(beta) ~ N(beta, (X^T X)^-1 \sigma^2)
where sigma^2 is the variance of residual
t-test for Linear Regression parameter
hypothesis: beta_i = 0
t_score = beta_i / (hat(sigma) * sqrt(v_i))
t_{N-p-1} distribution
calculate p value
F-test for Linear Regression parameters
hypothesis: beta_i+1 = beta_i+2 = … = beta_i+k = 0
F = \frac{(RSS_small - RSS_large)/k}{RSS_large/(N-i-k-1)}
F_{k, N-i-k-1} distribution
calculate p value
bias vs. variance
MSE(hat(y)) = MSE(hat(y)) + sigma^2
where sigma^2 is the variance of residual, f(y) = x^T beta
MSE(x^T hat(beta)) = Var(x^T hat(beta)) + [E(hat(beta)) - beta]^2
first term is variance, second term is bias
(*OLS is the estimate with smallest variance among all unbiased estimates, but for other biased estimates there could be solutions that gives smaller MSE)
ways to reduce variance of hat(beta) and optimize MSE
feature selection
shrinkage (ridge, lasso)
dimension reduction
Ridge
regularize with l2 norm of parameters
penalize proportional to the amplitude of the parameter
Lasso
regularize with l1 norm of parameters
set the parameter estimate to 0 when they are below a certain value
R^2
R^2 = 1 - frac{RSS_res}{RSS_tot}
adjusted R^2
Adjusted R^2 = 1 - frac{RSS_res/df_res}{RSS_tot/df_tot} df_res = n - p - 1 df_tot = n - 1 an unbiased (or less biased) estimator of the population R2, more appropriate when evaluating model fit and in comparing alternative models in the feature selection stage of model building