Path3.Mod1.e - Automated Machine Learning - Prep & Run AutoML Experiment Code Flashcards
- Order of operations between a Data Asset, an MLTable data asset and AutoML
- How an MLTable data asset is created
- You need to create the data asset first, then create the MLTable data asset that includes the schema used by AutoML to read that data.
- When your data is stored in a folder together with an MLTable file.
How to specify the data set as input using Python SDK code
The data must be in this form and must specify a certain column…
You need an Input
instance and to initialize it with an AssetType and the path to your data asset:
from azure.ai.ml.constants import AssetTypes from azure.ai.ml import Input training_data_input = Input( type=AssetTypes.MLTABLE, path="azureml:input-data-automl:1")
For ML tasks, the data must be in tabular form and include a target column.
Explain what this code is doing:
~~~
from azure.ai.ml import automl
classification_job = automl.classification(
compute=”aml-cluster”,
experiment_name=”auto-ml-class-dev”,
training_data=my_training_data_input,
target_column_name=”Diabetic”,
primary_metric=”accuracy”,
n_cross_validations=5,
enable_model_explainability=True
)
~~~
What my_training_data_input
and primary_metric
are.
This code uses the automl module from the Python SDK v2 to create a classification job instance. Noteable:
- Uses my_training_data_input
as the training data source. It should represent an MLTable data asset from your Workspace since AutoML requires one for input.
- Sets the primary_metric
to “accuracy”. It’s the target performance metric for which the optimal model will be determined.
Get a list of avaliable metrics to train a classification model
Use the ClassificationPrimaryMetrics
enum to get a list of them:
from azure.ai.ml.automl import ClassificationPrimaryMetrics list(ClassificationPrimaryMetrics)
TM TTM MT EET
Four limits you can set once you instantiate an AutoML experiment or job
The four limits you’d set for the job:
* timeout_minutes
- int. for terminating the AutoML expermiment
* trial_timeout_minutes
- int. max minutes a trial can take
* max_trials
- int. max number of trials or models that will be trained
* enable_early_termination
- bool. end experiment if score isn’t improving over the short term
The method you call when you want to set limits on your AutoML job
Call the job’s set_limits
method:
classification_job.set_limits( timout_minutes= 10, trial_timeout_minutes= 10, max_trials= 5, enable_early_termination= true)
Code to submit your AutoML Job
// submit the new job returned_job = ml_client.jobs.create_or_update(classification_job)
Code to get the url to monitor your job
// get the studio url so you can monitor your job aml_url = returned_job.studio.url print("Monitor job here:", aml_url)
The method you call when you want to set optional training properties on your AutoML job
set the training properties (optional) using the set_training
method
classification_job.set_training( blocked_training_algorithms=["LogisticRegression"], enable_onnx_compatible_models=True
The above code blocks LogisticRegression from being used for training models and enables ONNX compatible model creation.
See set_training