Regression Flashcards

1
Q

What are 3 possible non-linear basis functions?

A

-Polynomial: ϕj(x) = x^j
-Gaussian: ϕj(x) = exp(-(x-μj)²/2s²)
-Sigmoidal: ϕj(x) = σ((x-μj)/s), with σ(x) = 1/(1+exp(-x))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the geometrical interpretation of the least squares error?

A

The least squares solution is such that y^ is the orthogonal projection of y onto the subspace on the linear span of the features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to perform regression with multiple outputs?

A
  1. Use different sets of basis functions for each component of y (i.e. consider it as multiple independant regression problem).
  2. A better solution is to use the same set of basis functions to model all components. Then, the weights are contained in a matrix instead of a vector:
    W* = (ϕ.T * ϕ)^-1 * ϕ.T * Y
    => wk = (ϕ.T * ϕ)^-1 * ϕ.T * yk
    N.B.: It requires to compute only one pseudo-inverse matrix.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the gradient descent algorithm?

A

Repeat until convergence;
{
wj = wj - α * dL(w)/dwj for j = 0, 1, …
}
(α is the learning rate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the stochastic gradient descent (SGD) algorithm?

A

Repeat until convergence;
epoch{
for i in [1,N]:
wj = wj - α * d/dwj[(y^(x(i);w) - y(i))] for j = 0, 1, …
}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a compromise between standard gradient descent and stochastic gradient descent?

A

Mini-batch gradient descent consists in considering k«N examples when calculating the loss function (instead of N for standard GD and 1 for stochastic GD).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is early stopping?

A

It consists in stopping gradient descent before minimal error is reached to avoid overfitting. To do so, we monitor the validation loss after each epoch. We stop GD when it doesn’t decrease for k epochs (“k-patience”).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the closed form solution of ridge regression?

A

w = (X.T * X + λ Id)^-1 * X.T * Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the constrained optimization formula for L2 regularization (used in ridge regression)?

A

Find w that minimizes (Y-Xw).T * (Y-Xw),
subject to w.T * w <= η.
N.B.: η ∝ 1/λ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the constrained optimization formula for L1 regularization (used in lasso regression)?

A

Find w that minimizes (Y-Xw).T * (Y-Xw),
subject to |w| <= η.
N.B.: η ∝ 1/λ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the main property of convex functions?

A

Every (local) minimum is a global minimum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly