Regression Flashcards
What are 3 possible non-linear basis functions?
-Polynomial: ϕj(x) = x^j
-Gaussian: ϕj(x) = exp(-(x-μj)²/2s²)
-Sigmoidal: ϕj(x) = σ((x-μj)/s), with σ(x) = 1/(1+exp(-x))
What is the geometrical interpretation of the least squares error?
The least squares solution is such that y^ is the orthogonal projection of y onto the subspace on the linear span of the features.
How to perform regression with multiple outputs?
- Use different sets of basis functions for each component of y (i.e. consider it as multiple independant regression problem).
- A better solution is to use the same set of basis functions to model all components. Then, the weights are contained in a matrix instead of a vector:
W* = (ϕ.T * ϕ)^-1 * ϕ.T * Y
=> wk = (ϕ.T * ϕ)^-1 * ϕ.T * yk
N.B.: It requires to compute only one pseudo-inverse matrix.
What is the gradient descent algorithm?
Repeat until convergence;
{
wj = wj - α * dL(w)/dwj for j = 0, 1, …
}
(α is the learning rate)
What is the stochastic gradient descent (SGD) algorithm?
Repeat until convergence;
epoch{
for i in [1,N]:
wj = wj - α * d/dwj[(y^(x(i);w) - y(i))] for j = 0, 1, …
}
What is a compromise between standard gradient descent and stochastic gradient descent?
Mini-batch gradient descent consists in considering k«N examples when calculating the loss function (instead of N for standard GD and 1 for stochastic GD).
What is early stopping?
It consists in stopping gradient descent before minimal error is reached to avoid overfitting. To do so, we monitor the validation loss after each epoch. We stop GD when it doesn’t decrease for k epochs (“k-patience”).
What is the closed form solution of ridge regression?
w = (X.T * X + λ Id)^-1 * X.T * Y
What is the constrained optimization formula for L2 regularization (used in ridge regression)?
Find w that minimizes (Y-Xw).T * (Y-Xw),
subject to w.T * w <= η.
N.B.: η ∝ 1/λ
What is the constrained optimization formula for L1 regularization (used in lasso regression)?
Find w that minimizes (Y-Xw).T * (Y-Xw),
subject to |w| <= η.
N.B.: η ∝ 1/λ
What is the main property of convex functions?
Every (local) minimum is a global minimum.