Optimisation & Regularisation Flashcards

1
Q

How can learning be viewed as Optimisation?

A
  • Training
  • Model fitting
  • Parameter estimation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to decompose errors into bias and variance?

A

Error = bias^2 + variance + noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is bias?

A

How well our model can correctly predict the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is variance?

A

How well our model can respond to new data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to reduce overfitting?

A

Need to dampen the complexity, smoothing it out.

  • Regularisation
    • Restricting the degrees of freedom (effective number of parameters) present in our model
  • Will sacrifice training error for test error
    e. g. in SVMs, the slack variables provide the regularisation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is L1 Regularisation?

A

L1 weight regularisation penalises weight values by adding the sum of their absolute values to the error term

L1 regularisation encourages solutions where many parameters are zero

e.g. Lasso algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is L2 Regularisation?

A

L2 weight regularisation penalises weight values by adding the sum of their squared values to the error term

L2 regularisation encourages solutions where most parameter values are small.

e.g. Linear Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Batch vs Stochastic Gradient Descent

A

Batch: Evaluation of D occurs for entire data set at each iteration

  • Can be slow for large data sets
  • Cannot be used in incremental settings
  • Guaranteed to converge to the global minimum for convex error surfaces

Stochastic: Update is performed for each training instance

  • Order of training instances must be random
  • Updates are noisy, value of D will jump all over the place
  • “random walk” avoids getting stuck
  • Often only requires a small number of iterations through the full data set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to find the minimum error for regularisation/optimisation?

A

Gradient Descent to approximate rather than calculate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is it that parameter tuning might lead to overfitting?

A

-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the Gradient Descent method, and why is it important?

A

Gradient Descent is a mechanism for finding the minimum of a (convex) multivariate function where we can find its partial derivatives.

This is important because it allows us to determine the regression weights which minimise an error function over some training data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly