Bayesian optimization Flashcards
Name some methods for hyperparameter searching
- Grid search
- Random search
- Manual tuning
- Bayesian optimization
What is a proxy model
A model that is inexpensive to evaluate and approximates the true model.
Why would we use a proxy model?
To guide the search for hyperparameters, by minimizing the cheap proxy model instead of the expensive real model
What can we use for a proxy model?
Gaussian proccess
At what points do we want to evaluate the real model given the proxy model?
At points where the proxy model’s mean is low (Explotation), and the std. is high ( Exploration)
What is an acquisition function?
A function that quantifies how good it is to evaluate new points, trading off exploration vs explotation
Name som acquisition functions
- Probability of improvement
- Expectation of improvement
- GP Lower confidence bound
What is the problem with the probability of improvement acquisition function?
It doesn’t focus on how much we improve, meaning the reward might be very small, and we might get stuck “exploring” very close to the current best.
How can we solve the problem of getting “stuck” he probability of improvement acquisition function?
Introduce a slack variable, so
a(x) = p(g(x) < g(x_best) - slack))
What are the limitations of bayesian optimization using GP?
- Getting the function model (covariance function…) wrong might give bad results
- Limited by the number of dimensions and the number of evaluations of the true function
Which covariance kernels should usually be tried first?
A sufficiently flexible one like Matern. (Not Gaussian).
Maximising the marginal likelihood might fail for hyperparameter optimization, especially in the early stages where we have few datapoints, what can we do instead?
Integrate out hyperparameters using Markov Chain Monte Carlo.
What is the problem with scaling Bayesian optimization to high dimensions?
Optimizing the aquisition function is hard for high dimensions and might require many evaluations of the true model to reach a good minima.