Path4.Mod3.a - Perform Hyperparameter Tuning (Continuous vs Discontinuous) Flashcards
The diff between Parameters and Hyperparameters
Parameters are input values derived from training data/features where we use ML to discover relationships between that data.
Hyperparameters are values not derived from training features, used to configure training behavior
- Define Hyperparameter Tuning
- Azure ML uses this kind of job to tune Hyperparameters
- Train multiple models using the same algorithm and training data…but different hyperparameter values. Then evaluate each training run to determine your desired performance metric for which you want to optimize
- A Sweep Job runs trials for each hyperparam combo to be tested, using a training script with parameterized hyperparam values to train a model, then logs the target metric achieved by that model.
Four general steps in a Sweep Job workflow
- Create a training script for hyperparam tuning (Jupyter)
- Configure and run the Sweep Job by creating a regular Command Job
- Calling the Job’s
sweep(...)
function (don’t forget to callset_limits(...)
to control how long the sweeps go for…) - Monitor and Review Sweep Jobs
SS SM ET
Three things required for Hyperparameter Tuning
- Define a Search Space
- Configure a Sampling Method
- Configure Early Termination
one synonymous with Classification, the other Regression…
Search Space:
- What they are
- The two types of values a hyperparameter could be
A Search Space is a set of values tried during the tuning process.
Types of Values:
- Discrete - the value exists in finite space. Synonymous with Classification (a specific label or range)
- Continuous - the value exists in infinite space along a scale. Synonoumous with Regression (finding a numeric value)
G R B
Configure a Sampling Method:
- What a Sweep Job needs one for
- The three types of Sampling
The values used in a Sweep Job depend on the sampling method used, which provides input values based on the sampling technique specified.
The three options for Sampling Method:
- Grid Sampling
- Random Sampling
- Bayesian Sampling
m_t and eet from autoML
Configure Early Termination means to stop a Sweep Job based on one of these two conditions.
When (and when NOT) to use an Early Termination Policy
Configure a Sweep Job to stop:
- After a maximum number of trials
- When new Models don’t produce significantly better results
When: Depending on your Search Space and Samplilng Method, Early Termination may be beneficial when working with Continuous Hyperparameters (meaning infinite possible combinations…you don’t want it to go on forever).
When NOT: Conversely, it may be unnecessary to use Early Termination when using Discrete Hyperparameters (limited dimensions == finite set of combinations).
Discrete Hyperparameters:
- How to use the Choice function
- What values types it can take
- Example code for using it in a Sweep Job
Choice() is a function from the ML Python SDK that select a random Choice from the given inputs.
It can take:
- csv: batch_size=Choice(values="16, 32, 64"),
- a range
object: batch_size=Choice(range(10,20)),
- an arbitrary list
object: batch_size=Choice(values=[16, 32, 64]),
Remember that a Sweep Job is just a Job configured to “sweep” , so we still need to create the Job instance:
from azure.ai.ml.sweep import Choice, Normal command_job_for_sweep = job( batch_size=Choice(values=[16, 32, 64]), # Discrete Hyperparameter learning_rate=Normal(mu=10, sigma=3), # Continuous Hyperparameter )
Discrete Hyperparameters: Hyperparameters can be set to one of four other Discrete Distribution functions
Math: explain what the q
parameter is
Four other Discrete Distro functions you can use:
- QUniform(min_value, max_value, q)
- Returns a value like round(Uniform(min_value, max_value) / q) * q
- QLogUniform(min_value, max_value, q)
- Returns a value like round(exp(Uniform(min_value, max_value)) / q) * q
- QNormal(mu, sigma, q)
- Returns a value like round(Normal(mu, sigma) / q) * q
- QLogNormal(mu, sigma, q)
- Returns a value like round(exp(Normal(mu, sigma)) / q) * q
q
is the “limiting” parameter and what makes each of the above Discrete. Basically acts like a “step” function. So when distributing, you distribute by q-many steps.
Continuous Hyperparameters require one of these four methods for defining a Search Space
Four Continuous Distro functions you can use:
- Uniform(min_value, max_value)
- uniform distro between min and max
- LogUniform(min_value, max_value)
- a value drawn from exp(Uniform) so that the log of the return value is normally distributed
- Normal(mu, sigma)
- a real value normally distributed with mean mu and a standard deviation sigma
- LogNormal(mu, sigma)
- a value drawn from exp(Normal) so that the log of the return value is normally distributed