Lecture 13 - Single Global Part 1 Flashcards
What is Hill Climbing with Restarts?
This is a global optimisation strategy that extends basic hill climbing by:
* Running multiple hill climbs, each from a new random starting point
* Each hill climb runs for a random amount of time, drawn from a distribution T
Instead of committing to one trajectory, you probe different areas of the search space, trying to avoid getting stuck in bad local optima.
Algorithm 10 - Hill Climbing with Restarts
REFER TO NOTES
What are the advantages/disadvantages of Hill Climbing with Restarts
Advantages:
Escapes local minima
Balances exploration and exploitation
Disavantages:
Requires time budgeting
Wasteful is restart logic is poor
No learning from past run
What are the limitations Uniform Tweaks?
REFER TO NOTES -> Uses Algorithm 8
No change of bigger jumps - bounded by r, can never take bigger jumps than r
Exploration is capped
Locality is compromised
What are Gaussian Tweaks?
Gaussian tweaks are a type of probabilistic modification applied to candidate solutions in stochastic optimisation, where the change (or “tweak”) is sampled from a Gaussian (Normal) distribution cantered at the current value.
What is the purpose of Gaussian Tweaks?
- Models realistic, gradual improvement in solution space.
- Captures the idea that small changes are more likely to be useful, but large changes should still occasionally happen to escape local optima.
- Respects locality (nearby solutions are more similar).
Why do we use Gaussian Tweaks?
Mimics natural behaviour of search:
Allows for many small changes (local exploitation) and occasional large jumps (global exploration)
How do Gaussian Tweaks work?
Replace uniform noise with normal noise.
σ becomes a hyper-parameter controlling the rate of exploration.
What are the advantages/disadvantages of Gaussian Tweaks
Advantages:
- Respects locality.
- Enables scalable exploration.
Disadvantages:
- Choosing the right σ is hard — too small = stuck; too large = too random.
Algorithm 11 - Gaussian Tweaks
REFER TO NOTES - same a bounded uniform but r is replaced by sigma^2
What is the difference between Gaussian Tweaks and Uniform Tweaks?
Uniform Tweaks:
Fixed bounds
Locality - all sizes are equal
Exploration is capped by r
Needs artificial bounds
Gaussian Tweaks:
Infinte range
Locality - small changes more likely
Exploration is possible
Tuned naturally by sigma
What is the Interdependence of Hyper-Parameters?
Not to treat hyper-parameters as if they act independently. In practice, they interact, and this interaction can be nonlinear, non-obvious, and highly problem-specific.
Why do Interdependence of Hyper-Parameters matter?
Why this matters:
* In your optimiser, you might have:
○ σ: tweak size
○ n: number of candidates sampled per iteration
○ p: probability of tweaking each dimension
○ T: restart time distribution
* Tuning these one at a time may not work because:
A change in one hyper-parameter might amplify or cancel out the effect of another.
What are the core ideas of tweak noise (sigma) interacting with samples per iteration
σ (large) - Makes tweaks more extreme → better for exploration
n (large) - Increases selective pressure → reducing exploration
Why is it not always predictable (tweak noise (sigma)) interacting with samples per iteration)?
Suppose you sample a large jump with σ = 1.0:
- It might explore a new peak (good!)
- But unless it’s immediately better, it could be culled if n is high
That means:
- Exploration is attempted, but not preserved