Bayesian optimization Flashcards

1
Q

Name some methods for hyperparameter searching

A
  1. Grid search
  2. Random search
  3. Manual tuning
  4. Bayesian optimization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a proxy model

A

A model that is inexpensive to evaluate and approximates the true model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why would we use a proxy model?

A

To guide the search for hyperparameters, by minimizing the cheap proxy model instead of the expensive real model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What can we use for a proxy model?

A

Gaussian proccess

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

At what points do we want to evaluate the real model given the proxy model?

A

At points where the proxy model’s mean is low (Explotation), and the std. is high ( Exploration)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an acquisition function?

A

A function that quantifies how good it is to evaluate new points, trading off exploration vs explotation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Name som acquisition functions

A
  1. Probability of improvement
  2. Expectation of improvement
  3. GP Lower confidence bound
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the problem with the probability of improvement acquisition function?

A

It doesn’t focus on how much we improve, meaning the reward might be very small, and we might get stuck “exploring” very close to the current best.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can we solve the problem of getting “stuck” he probability of improvement acquisition function?

A

Introduce a slack variable, so

a(x) = p(g(x) < g(x_best) - slack))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the limitations of bayesian optimization using GP?

A
  1. Getting the function model (covariance function…) wrong might give bad results
  2. Limited by the number of dimensions and the number of evaluations of the true function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which covariance kernels should usually be tried first?

A

A sufficiently flexible one like Matern. (Not Gaussian).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Maximising the marginal likelihood might fail for hyperparameter optimization, especially in the early stages where we have few datapoints, what can we do instead?

A

Integrate out hyperparameters using Markov Chain Monte Carlo.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the problem with scaling Bayesian optimization to high dimensions?

A

Optimizing the aquisition function is hard for high dimensions and might require many evaluations of the true model to reach a good minima.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly