Linear Regression Flashcards

1
Q

Assumption of Linear Regression

A

E(y|x) = f(X) is a linear function
residual independent normal distribution with 0 mean and constant variance
residual independent with X
X no multicollinearity
number of sample more than number of features
variability in X is positive
no auto-correlation in residual

http://r-statistics.co/Assumptions-of-Linear-Regression.html

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Estimate of Linear Regression parameters

A

OLS: minimize the Residual Sum of Squares
Normal equation: hat(beta) = (X^T X)^-1 X^T y
Confidence interval of parameters
hat(beta) ~ N(beta, (X^T X)^-1 \sigma^2)
where sigma^2 is the variance of residual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

t-test for Linear Regression parameter

A

hypothesis: beta_i = 0
t_score = beta_i / (hat(sigma) * sqrt(v_i))
t_{N-p-1} distribution
calculate p value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

F-test for Linear Regression parameters

A

hypothesis: beta_i+1 = beta_i+2 = … = beta_i+k = 0
F = \frac{(RSS_small - RSS_large)/k}{RSS_large/(N-i-k-1)}
F_{k, N-i-k-1} distribution
calculate p value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

bias vs. variance

A

MSE(hat(y)) = MSE(hat(y)) + sigma^2
where sigma^2 is the variance of residual, f(y) = x^T beta

MSE(x^T hat(beta)) = Var(x^T hat(beta)) + [E(hat(beta)) - beta]^2
first term is variance, second term is bias

(*OLS is the estimate with smallest variance among all unbiased estimates, but for other biased estimates there could be solutions that gives smaller MSE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

ways to reduce variance of hat(beta) and optimize MSE

A

feature selection
shrinkage (ridge, lasso)
dimension reduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Ridge

A

regularize with l2 norm of parameters

penalize proportional to the amplitude of the parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Lasso

A

regularize with l1 norm of parameters

set the parameter estimate to 0 when they are below a certain value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

R^2

A

R^2 = 1 - frac{RSS_res}{RSS_tot}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

adjusted R^2

A
Adjusted R^2 = 1 - frac{RSS_res/df_res}{RSS_tot/df_tot}
df_res = n - p - 1
df_tot = n - 1
an unbiased (or less biased) estimator of the population R2, more appropriate when evaluating model fit and in comparing alternative models in the feature selection stage of model building
How well did you know this?
1
Not at all
2
3
4
5
Perfectly