Regression Flashcards
Simple Linear Regression
y = w0 + w1x
for each point yi = w0 + w1xi + e1
RSS(w0,w1) = sum(yi-(w0+w1*xi))^2
Goal: find weights that minimize RSS
w0,w1 = arg min RSS(w0,w1)
= arg min sum(yi - (w0+w1*xi))^2
Compute Best Weights
1) Gradient RSS to zero
2) Gradient descent
w^(t+1) = w^(t)- n * grad(RSS(w^t))
-2 * sum(yi - (w0+w1xi))
-2 * sum(yi - (w0+w1xi))*xi
Multiple Linear Regression
y = w0 + sum(wj*xj) + e
RSS
Gradient descent
w^(t+1) = w^(t) + 2n * sum(yi - sum(wj * hj(xi))^2)
R2
TSS = sum (yi - y’)^2
R2 = 1 - RSS/TSS
How well the regression line approximates the real data points.
R2 = 1 perfect
Model Evaluation
Using data not used for building model
Holdout 1/2 or 2/3
Cross Validation
1) Data split into k
2) Subset turn is used for testing and remainder for test
Often stratified
Overfitting
Ridge (L2) Cost(w) = RSS + alpha * (||w||_2)^2 Gradient wj = -2 * sum(hj(xi)(yi-yi’*(w^t))) + 2*alpha*wj Lasso (L1) Cost(w) = RSS + alpha * ||w||_1
Choosing alpha
Validation Set