01-Getting_Started_with_Azure_ML Flashcards

Question 1

Q

Run Configuration

Answer

A

Defines the Python code execution environment for the script

E.g., sets a Conda environment with some default Python packages installed

# create a new RunConfig object
experiment_run_config = RunConfiguration()

Question 2

Q

Script Configuration

Answer

A

Identifies the Python script file to be run in the experiment, and the environment in which to run it

# Create a script config
src = ScriptRunConfig(source_directory=experiment_folder, 
                      script='diabetes_experiment.py',
                      run_config=experiment_run_config
)

Question 3

Q

How to setup Model Training

Answer

A

Connect to Your Workspace
Create folder for experiment files (data + training script)
Create and generate a Training Script
Use an Estimator to Run the Script as an Experiment
Register the Trained Model

Question 4

Q

Training/Entry Scripts

Answer

A

# import libs
# Get the experiment run context
# Load training data
# Separate features and labels
# Split data into training set and test set
# create and Train some model
# Score / predict model
# Evaluate
# Save the model to experiment folder
# Wait for completion (run.complete())

Question 5

Q

Estimator

Answer

A

You can run experiment scripts using a RunConfiguration and a ScriptRunConfig, or you can use an Estimator, which abstracts both of these configurations in a single object to run the training experiment.

An estimator runs a training script

Question 6

Q

Create an estimator

Answer

A

estimator = Estimator(source_directory=training_folder, entry_script=’diabetes_training.py’,
compute_target=’local’,
conda_packages=[‘scikit-learn’]
)

# Create an experiment
experiment_name = 'diabetes-training'
experiment = Experiment(workspace = ws, name = experiment_name)

# Run the experiment based on the estimator
run = experiment.submit(config=estimator)
run.wait_for_completion(show_output=True)

Question 7

Q

Create and Run an Experiment

Answer

A

experiment = Experiment(workspace = ws, name = experiment_name)

# Run the experiment
run = experiment.submit(config=estimator)

Question 8

Q

RunDetails widget

Answer

A

As with any experiment run, you can use the RunDetails widget to view information about the run and get a link to it in Azure Machine Learning studio

Question 9

Q

Retrieve the metrics and outputs from the Run object.

Answer

A

# Get logged metrics
metrics = run.get_metrics()
for key in metrics.keys():
        print(key, metrics.get(key))
print('\n')
for file in run.get_file_names():
    print(file)

Output

Regularization Rate 0.01
Accuracy 0.774
AUC 0.8483377282451863

azureml-logs/60_control_log.txt
azureml-logs/70_driver_log.txt
logs/azureml/8_azureml.log
outputs/diabetes_model.pkl

Question 10

Q

Register a Trained Model

Answer

A

Note that the outputs of the experiment include the trained model file (diabetes_model.pkl).

You can register a model in your Azure Machine Learning workspace, making it possible to track model versions and retrieve them later.

# Register the model
run.register_model(model_path='outputs/diabetes_model.pkl', model_name='diabetes_model',
                   tags={'Training context':'Estimator'},
                   properties={'AUC': run.get_metrics()['AUC'], 'Accuracy': run.get_metrics()['Accuracy']})

Question 11

Q

Create a Parameterized Training Script

Answer

A

You can increase the flexibility of your training experiment by adding parameters to your entry script, enabling you to repeat the same training experiment with different settings

# Set regularization hyperparameter
parser = argparse.ArgumentParser()
parser.add_argument('--reg_rate', type=float, dest='reg', default=0.01)
args = parser.parse_args()
reg = args.reg

Question 12

Q

Use a Framework-Specific Estimator

Answer

A

You used a generic Estimator class to run the training script, but you can also take advantage of framework-specific estimators that include environment definitions for common machine learning frameworks. In this case, you’re using Scikit-Learn, so you can use the SKLearn estimator. This means that you don’t need to specify the scikit-learn package in the configuration.

# Create an estimator
estimator = SKLearn(source_directory=training_folder,
entry_script='diabetes_training.py',
                    script_params = {'--reg_rate': 0.1},
                    compute_target='local'
                    )

Question 13

Q

Working with Data

Answer

A

Data is the foundation on which machine learning models are built. Managing data centrally in the cloud, and making it accessible to teams of data scientists who are running experiments and training models on multiple workstations and compute targets is an important part of any professional data science solution.

Question 14

Q

Datastore

Answer

A

In Azure ML, datastores are references to storage locations, such as Azure Storage blob containers. Every workspace has a default datastore - usually the Azure storage blob container that was created with the workspace.

If you need to work with data that is stored in different locations, you can add custom datastores to your workspace and set any of them to be the default.

You can use local data files to train a model, but when running training workloads automatically on cloud-based compute, it makes more sense to store the data centrally in the cloud and ingest it into the training script wherever it happens to be running.

Question 15

Q

Upload Data to a Datastore

Answer

A

You can upload files from your local file system to a datastore so that it will be accessible to experiments running in the workspace, regardless of where the experiment script is actually being run.

default_ds.upload_files(files=[’./data/diabetes.csv’, ‘./data/diabetes2.csv’], # Upload the diabetes csv files in /data
target_path=’diabetes-data/’, # Put it in a folder path in the datastore
overwrite=True, # Replace existing files of the same name
show_progress=True)

Question 16

Q

Train a Model from a Datastore

Answer

A

When you uploaded the files in the code cell above, note that the code returned a data reference.

The data reference can be used to download the contents of the folder to the compute context where the data reference is being used

Downloading data works well for small volumes of data that will be processed on local compute. When working with remote compute, you can also configure a data reference to mount the datastore location and read data directly from the data source.

The entry script (via Estimator/experiment) will load the training data from the data reference passed to it as a parameter

# Set up the parameters
script_params = {
    '--regularization': 0.1, # regularization rate
    '--data-folder': data_ref # data reference to download files from datastore
}

# Create an estimator
estimator = SKLearn(source_directory=experiment_folder,
                    entry_script='diabetes_training.py',
                    script_params=script_params,
                    compute_target = 'local'
                   )

# Create an experiment
experiment_name = 'diabetes-training'
experiment = Experiment(workspace = ws, name = experiment_name)

# Run the experiment
run = experiment.submit(config=estimator)

Question 17

Q

Data reference

Answer

A

A data reference provides a way to pass the path to a folder in a datastore to a script, regardless of where the script is being run, so that the script can access data in the datastore location.

The data reference can be used to download the contents of the folder to the compute context where the data reference is being used

Downloading data works well for small volumes of data that will be processed on local compute. When working with remote compute, you can also configure a data reference to mount the datastore location and read data directly from the data source.

Question 18

Q

Datasets

Answer

A

While you can read data directly from datastores, Azure Machine Learning provides a further abstraction for data in the form of datasets.

A dataset is a versioned reference to a specific set of data that you may want to use in an experiment.

Datasets can be tabular or file-based.

It’s easy to convert a tabular dataset to a Pandas dataframe, enabling you to work with the data using common Python techniques.

Question 19

Q

Create a Tabular Dataset

Answer

A

from azureml.core import Dataset

# Get the default datastore
default_ds = ws.get_default_datastore()

#Create a tabular dataset from the path on the datastore 
tab_data_set = Dataset.Tabular.from_delimited_files(path=(default_ds, 'diabetes-data/*.csv'))

Question 20

Q

Create a File Dataset

Answer

A

Some machine learning scenarios you might need to work with data that is unstructured; or you may simply want to handle reading the data from files in your own code. To accomplish this, you can use a file dataset, which creates a list of file paths in a virtual mount point, which you can use to read the data in the files.

# Create a file dataset from the path on the datastore 
file_data_set = Dataset.File.from_files(path=(default_ds, 'diabetes-data/*.csv'))

# Get the files in the dataset
for file_path in file_data_set.to_path():
    print(file_path)

Question 21

Q

Register Datasets

Answer

A

You can register data sets to make them easily accessible to any experiment being run in the workspace.

You can view and manage datasets on the Datasets page for your workspace in Azure ML Studio or via code.

# Register the tabular dataset
try:
    tab_data_set = tab_data_set.register(workspace=ws, 
                                        name='diabetes dataset',
                                        description='diabetes data',
                                        tags = {'format':'CSV'},
                                        create_new_version=True)
except Exception as ex:
    print(ex)

# Register the file dataset
try:
    file_data_set = file_data_set.register(workspace=ws,
                                            name='diabetes file dataset',
                                            description='diabetes files',
                                            tags = {'format':'CSV'},
                                            create_new_version=True)
except Exception as ex:
    print(ex)

print(‘Datasets registered’)

Question 22

Q

Train a Model from a Tabular Dataset

Answer

A

Now that you have datasets, you’re ready to start training models from them. You can pass datasets to scripts as inputs in the estimator being used to run the script.

# Get the training dataset
diabetes_ds = ws.datasets.get("diabetes dataset")

# Create an estimator
estimator = SKLearn(source_directory=experiment_folder,
                    entry_script='diabetes_training.py',
                    script_params=script_params,
                    compute_target = 'local',
                   # Pass the Dataset object as an input...                    
                  inputs=[diabetes_ds.as_named_input('diabetes')], 
                  pip_packages=['azureml-dataprep[pandas]'] 
)

Question 23

Q

Train a Model from a File Dataset

Answer

A

When you’re using a file dataset, the dataset input passed to the script represents a mount point containing file paths. How you read the data from these files depends on the kind of data in the files and what you want to do with it.

You can use the Python glob module to create a list of files in the virtual mount point defined by the dataset, and read them all into Pandas dataframes that are concatenated into a single dataframe.

For large volumes of data, you’d generally use the as_mount method to stream the files directly from the dataset source; but when running on local compute, you need to use the as_download option to download the dataset files to a local folder.

# Get the training dataset
diabetes_ds = ws.datasets.get("diabetes file dataset")

# Create an estimator
estimator = SKLearn(source_directory=experiment_folder,
                    entry_script='diabetes_training.py',
                    script_params=script_params,
                    compute_target = 'local',
                    inputs=[diabetes_ds.as_named_input('diabetes').as_download(path_on_compute='diabetes_data')], 
                    pip_packages=['azureml-dataprep[pandas]'] 
                   )

Question 24

Q

Working with Compute

Answer

A

When you run a script as an Azure Machine Learning experiment, you need to define the execution context for the experiment run. The execution context is made up of:

The Python environment for the script, which must include all Python packages used in the script. The compute will require a Python environment with the necessary package dependencies installed
The compute target on which the script will be run.

This could be the local workstation from which the experiment run is initiated, or a remote compute target such as a training cluster that is provisioned on-demand.

Question 25

Q

Define an Environment (Run Configuration)

Answer

A

When you run a Python script as an experiment in Azure Machine Learning, a Conda environment is automatically created to define the execution context for the script.

Azure Machine Learning provides a default environment that includes many common packages; including the azureml-defaults package that contains the libraries necessary for working with an experiment run, as well as popular packages like pandas and numpy.

You can also define your own environment and add packages by using conda or pip, to ensure your experiment has access to all the libraries it requires.

Example

from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies

# Create a Python environment for the experiment
diabetes_env = Environment("diabetes-experiment-env")
# Let Azure ML manage dependencies
diabetes_env.python.user_managed_dependencies = False 
# Use a docker container
diabetes_env.docker.enabled = True

# Create a set of package dependencies (conda or pip as required)
diabetes_packages = CondaDependencies.create(conda_packages=['scikit-learn'], pip_packages=['azureml-defaults', 'azureml-dataprep[pandas]'])

# Add the dependencies to the environment
diabetes_env.python.conda_dependencies = diabetes_packages

print(diabetes_env.name, ‘defined.’)

# Register the environment
diabetes_env.register(workspace=ws)

Use in Estimator:

# Create an estimator
estimator = Estimator(source_directory=experiment_folder,
                      inputs=[diabetes_ds.as_named_input('diabetes')],
                      script_params=script_params,
                      compute_target = 'local',
                      environment_definition = diabetes_env,
                      entry_script='diabetes_training.py')

# Create an experiment
experiment = Experiment(workspace = ws, name = 'diabetes-training')

# Run the experiment
run = experiment.submit(config=estimator)

Question 26

Q

Register the environment

Answer

A

Having gone to the trouble of defining an environment with the packages you need, you can register it in the workspace.

# Register the environment
diabetes_env.register(workspace=ws)

Question 27

Q

Run an Experiment on a Remote Compute Target

Answer

A

In many cases, your local compute resources may not be sufficient to process a complex or long-running experiment that needs to process a large volume of data; and you may want to take advantage of the ability to dynamically create and use compute resources in the cloud.

Azure ML supports a range of compute targets, which you can define in your workpace and use to run experiments; paying for the resources only when using them.

In this case, we’ll run the diabetes training experiment on a compute cluster with a unique name of your choosing

You can do this by specifying the compute_target parameter in the estimator (you can set this to either the name of the compute target, or a ComputeTarget object.)

Example

compute_config = AmlCompute.provisioning_configuration(vm_size=’STANDARD_D2_V2’, max_nodes=4)
training_cluster = ComputeTarget.create(ws, cluster_name, compute_config)

# Create an estimator
estimator = Estimator(source_directory=experiment_folder,
                      inputs=[diabetes_ds.as_named_input('diabetes')],
                      script_params=script_params,
                      compute_target = cluster_name, # Run the experiment on the remote compute target
                      environment_definition = registered_env,
                      entry_script='diabetes_training.py')

Question 28

Q

Creating an Azure Machine Learning Pipeline

Answer

A

You can perform the various steps required to ingest data, train a model, and register the model individually by using the Azure ML SDK to run script-based experiments.

However, in an enterprise environment it is common to encapsulate the sequence of discrete steps required to build a machine learning solution into a pipeline that can be run on one or more compute targets, either on-demand by a user, from an automated build process, or on a schedule.

Question 29

Q

Create Scripts for Pipeline Steps

Answer

A

Pipelines consist of one or more steps, which can be Python scripts, or specialized steps like an Auto ML training estimator or a data transfer step that copies data from one location to another. Each step can run in its own compute context.

The pipeline will eventually be published and run on-demand, so it needs a compute environment in which to run.

Question 30

Q

Define Example Pipeline

Answer

A

from azureml.pipeline.core import PipelineData
from azureml.pipeline.steps import PythonScriptStep, EstimatorStep
from azureml.train.estimator import Estimator

# Get the training dataset
diabetes_ds = ws.datasets.get("diabetes dataset")

# Create a PipelineData (Data Reference) for the model folder
model_folder = PipelineData("model_folder", datastore=ws.get_default_datastore())

estimator = Estimator(source_directory=experiment_folder,
compute_target = pipeline_cluster,
environment_definition=pipeline_run_config.environment,
entry_script=’train_diabetes.py’)

# Step 1, run the estimator to train the model
train_step = EstimatorStep(name = "Train Model",
                           estimator=estimator, 
                           estimator_entry_script_arguments=['--output_folder', model_folder],
                           inputs=[diabetes_ds.as_named_input('diabetes_train')],
                           outputs=[model_folder],
                           compute_target = pipeline_cluster,
                           allow_reuse = True)

# Step 2, run the model registration script
register_step = PythonScriptStep(name = "Register Model",
                                source_directory = experiment_folder,
                                script_name = "register_diabetes.py",
                                arguments = ['--model_folder', model_folder],
                                inputs=[model_folder],
                                compute_target = pipeline_cluster,
                                runconfig = pipeline_run_config,
                                allow_reuse = True)

print(“Pipeline steps defined”)

Question 31

Q

Prepare a Compute Environment for the Pipeline Steps

Answer

A

The pipeline will eventually be published and run on-demand, so it needs a compute environment in which to run.

You can use the same compute for alls steps, but it’s important to realize that each step is run independently; so you could specify different compute contexts for each step if appropriate.

Question 32

Q

Create and Run a Pipeline

Answer

A

First you need to define the steps for the pipeline, and any data references that need to passed between them, using a PipelineData object.

In this case, the first step must write the model to a folder that can be read from by the second step.

Since the steps will be run on remote compute (and in fact, could each be run on different compute), the folder path must be passed as a data reference to a location in a datastore within the workspace.

Question 33

Q

The PipelineData object

Answer

A

The PipelineData object is a special kind of data reference that is used to pass data from the output of one pipeline step to the input of another, creating a dependency between them.

Question 34

Q

Build the defined pipeline and run it as an experiment

Answer

A

from azureml.core import Experiment
from azureml.pipeline.core import Pipeline
from azureml.widgets import RunDetails

# Construct the pipeline
pipeline_steps = [train_step, register_step]
pipeline = Pipeline(workspace = ws, steps=pipeline_steps)
print("Pipeline is built.")

# Create an experiment and run the pipeline
experiment = Experiment(workspace = ws, name = 'diabetes-training-pipeline')
pipeline_run = experiment.submit(pipeline, regenerate_outputs=True)
print("Pipeline submitted for execution.")

RunDetails(pipeline_run).show()
pipeline_run.wait_for_completion()

You can also monitor pipeline runs in the Experiments page in Azure Machine Learning studio.

Question 35

Q

Publish a Pipeline

Answer

A

When you’ve created a pipeline and verified it works, you can publish it as a REST service

published_pipeline = pipeline.publish(name=”Diabetes_Training_Pipeline”,
description=”Trains diabetes model”, version=”1.0”)
rest_endpoint = published_pipeline.endpoint
print(rest_endpoint)

Question 36

Q

Call a Pipeline

Answer

A

To use the endpoint, client applications need to make a REST call over HTTP. This request must be authenticated, so an authorization header is required. A real application would require a service principal with which to be authenticated

published_pipeline_run = PipelineRun(ws.experiments[experiment_name], run_id)

Question 37

Q

Azure ML Pipelines vs Azure DevOps Pipelines

Answer

A

You can use the Azure Machine Learning extension for Azure DevOps to combine Azure ML pipelines with Azure DevOps pipelines and integrate model retraining into a continuous integration/continuous deployment (CI/CD) process.

For example you could use an Azure DevOps build pipeline to trigger an Azure ML pipeline that trains and registers a model, and when the model is registered it could trigger an Azure Devops release pipeline that deploys the model as a web service, along with the application or service that consumes the model.

Question 38

Q

Register a Model

Answer

A

# Register the model
run.register_model(model_path='outputs/diabetes_model.pkl', model_name='diabetes_model',
                   tags={'Training context':'Inline Training'},
                   properties={'AUC': run.get_metrics()['AUC'], 'Accuracy': run.get_metrics()['Accuracy']})

Question 39

Q

Deploy a Model as a Web Service

Answer

A

We’re going to create a web service to host this model, and this will require some code and configuration files; so let’s create a folder for those.
The web service where we deploy the model will need some Python code to load the input data, get the model from the workspace, and generate and return predictions. We’ll save this code in an score script that will be deployed to the web service
- init() –> # Loads the mode when the service is loaded
- run(input_data) –> # Called when a request is received
The web service will be hosted in a Azure container instance (ACI), and the container will need to install any required Python dependencies when it gets initialized. So we’ll create a .yml file that tells the container host to install this into the environment.
We’ll deploy the container a service named diabetes-service. The deployment process includes the following steps:
- Define an inference configuration (scoring environment), which includes the scoring and environment files required to load and use the model.
  - Define a deployment configuration that defines the execution environment in which the service will be hosted. In this case, an Azure Container Instance.
- Deploy the model as a web service.
- Verify the status of the deployed service.

Example

from azureml.core.webservice import AciWebservice
from azureml.core.model import InferenceConfig

# Configure the scoring environment
inference_config = InferenceConfig(runtime= "python",
                                   source_directory = folder_name,
                                   entry_script="score_diabetes.py",
                                   conda_file="diabetes_env.yml")

deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)

service_name = “diabetes-service”

service = Model.deploy(ws, service_name, [model], inference_config, deployment_config)

service.wait_for_deployment(True)
print(service.state)

Question 40

Q

Score Script Example (real-time)

Answer

A

%%writefile $folder_name/score_diabetes.py
import json
import joblib
import numpy as np
from azureml.core.model import Model

# Called when the service is loaded
def init():
    global model
    # Get the path to the deployed model file and load it
    model_path = Model.get_model_path('diabetes_model')
    model = joblib.load(model_path)

# Called when a request is received
def run(raw_data):
    # Get the input data as a numpy array
    data = np.array(json.loads(raw_data)['data'])
    # Get a prediction from the model
    predictions = model.predict(data)
    # Get the corresponding classname for each prediction (0 or 1)
    classnames = ['not-diabetic', 'diabetic']
    predicted_classes = []
    for prediction in predictions:
        predicted_classes.append(classnames[prediction])
    # Return the predictions as JSON
    return json.dumps(predicted_classes)

Question 41

Q

Consume ACI Web Service (SDK) (Real time inferencing)

Answer

A

With the service deployed, now you can consume it from a client application.

The code below uses the Azure ML SDK to connect to the containerized web service and use it to generate predictions from your diabetes classification model. In production, a model is likely to be consumed by business applications that do not use the Azure ML SDK, but simply make HTTP requests to the web service.

import json

This time our input is an array of two feature arrays
x_new = [[2,180,74,24,21,23.9091702,1.488172308,22],
[0,148,58,11,179,39.19207553,0.160829008,45]]

# Convert the array or arrays to a serializable list in a JSON document
input_json = json.dumps({"data": x_new})

# Call the web service, passing the input data
predictions = service.run(input_data = input_json)

# Get the predicted classes.
predicted_classes = json.loads(predictions)

for i in range(len(x_new)):
print (“Patient {}”.format(x_new[i]), predicted_classes[i] )

Question 42

Q

Example Service Endpoint Uri

Answer

A

endpoint = service.scoring_uri
print(endpoint)

http://34733966-1951-4854-8c7c-1173ec0aae1b.northeurope.azurecontainer.io/score

Question 43

Q

Consume ACI Web Service (REST Endpoint) (Real time inferencing)

Answer

A

Now that you know the endpoint URI, an application can simply make an HTTP request, sending the patient data in JSON (or binary) format, and receive back the predicted class(es).

import requests
import json

x_new = [[2,180,74,24,21,23.9091702,1.488172308,22],
[0,148,58,11,179,39.19207553,0.160829008,45]]

# Convert the array to a serializable list in a JSON document
input_json = json.dumps({"data": x_new})

# Set the content type
headers = { 'Content-Type':'application/json' }

predictions = requests.post(endpoint, input_json, headers = headers)
predicted_classes = json.loads(predictions.json())

for i in range(len(x_new)):
print (“Patient {}”.format(x_new[i]), predicted_classes[i] )

Question 44

Q

Batch inferencing

Answer

A

to process data as a batch

Question 45

Q

Create a Pipeline for Batch Inferencing

Answer

A

Our pipeline will need Python code to perform the batch inferencing, so let’s create a folder where we can keep all the files used by the pipeline
Now we’ll create a Python batch score/inference script to do the actual work, and save it in the pipeline folder
Define Run Context with dependencies for the scoring script
Define a ParallelRunStep config and a ParallelRunStep that calls the batch scoring script
Create Pipeline including the ParallelRunStep
Run the pipeline as an experiment
Publish the Pipeline and use its REST Interface

Question 46

Q

Batch Score Script Example (real-time)

Answer

A

%%writefile $experiment_folder/batch_diabetes.py
import os
import numpy as np
from azureml.core import Model
import joblib

def init():
    # Runs when the pipeline step is initialized
    global model

    # load the model
    model_path = Model.get_model_path('diabetes_model')
    model = joblib.load(model_path)

def run(mini_batch):
    # This runs for each batch
    resultList = []

    # process each file in the batch
    for f in mini_batch:
        # Read the comma-delimited data into an array
        data = np.genfromtxt(f, delimiter=',')
        # Reshape into a 2-dimensional array for prediction (model expects multiple items)
        prediction = model.predict(data.reshape(1, -1))
        # Append prediction to results
        resultList.append("{}: {}".format(os.path.basename(f), prediction[0]))
    return resultList

Question 47

Q

ParallelRunStep

Answer

A

Enables the batch data to be processed in parallel and the results collated in a single output file

Question 48

Q

Tuning Hyperparameters

Answer

A

There are many machine learning algorithms that require hyperparameters (parameter values that influence training, but can’t be determined from the training data itself).

For example, when training a logistic regression model, you can use a regularization rate hyperparameter to counteract bias in the model; or when training a convolutional neural network, you can use hyperparameters like learning rate and batch size to control how weights are adjusted and how many data items are processed in a mini-batch respectively.

The choice of hyperparameter values can significantly affect the performance of a trained model, or the time taken to train it; and often you need to try multiple combinations to find the optimal solution.

Question 49

Q

Exampel Hyperdrive Experiment

Answer

A

from azureml.core import Experiment
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive import GridParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal, choice
from azureml.widgets import RunDetails

Sample a range of parameter values
params = GridParameterSampling(
{
# There’s only one parameter, so grid sampling will try each value - with multiple parameters it would try every combination
‘–regularization’: choice(0.001, 0.005, 0.01, 0.05, 0.1, 1.0)
}
)

# Get the training dataset
diabetes_ds = ws.datasets.get("diabetes dataset")

# Create an estimator that uses the remote compute
hyper_estimator = SKLearn(source_directory=experiment_folder,
                          inputs=[diabetes_ds.as_named_input('diabetes')], # Pass the dataset as an input...
                          pip_packages=['azureml-sdk'], # ...so we need azureml-dataprep (it's in the SDK!)
                          entry_script='diabetes_training.py',
                          compute_target = training_cluster,)

# Configure hyperdrive settings
hyperdrive = HyperDriveConfig(estimator=hyper_estimator, 
                          hyperparameter_sampling=params, 
                          policy=None, 
                          primary_metric_name='AUC', 
                          primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, 
                          max_total_runs=6,
                          max_concurrent_runs=4)

# Run the experiment
experiment = Experiment(workspace = ws, name = 'diabates_training_hyperdrive')
run = experiment.submit(config=hyperdrive)

Show the status in the notebook as the experiment runs
RunDetails(run).show()
run.wait_for_completion()

Question 50

Q

Hyperdrive Experiments

Answer

A

Azure Machine Learning includes a hyperparameter tuning capability through Hyperdrive experiments.

These experiments launch multiple child runs, each with a different hyperparameter combination.

The run producing the best model (as determined by the logged target performance metric for which you want to optimize) can be identified, and its trained model selected for registration and deployment.

Question 51

Q

Hyperparameter Tuning - Determine the Best Performing Run

Answer

A

When all of the runs have finished, you can find the best one based on the performance metric you specified (in this case, the one with the best AUC).

for child_run in run.get_children_sorted_by_primary_metric():
print(child_run)

best_run = run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()
parameter_values = best_run.get_details() ['runDefinition']['arguments']

print(‘Best Run Id: ‘, best_run.id)
print(‘ -AUC:’, best_run_metrics[‘AUC’])
print(‘ -Accuracy:’, best_run_metrics[‘Accuracy’])
print(‘ -Regularization Rate:’,parameter_values)

from azureml.core import Model

# Register best model
best_run.register_model(model_path='outputs/diabetes_model.pkl', model_name='diabetes_model',
                        tags={'Training context':'Hyperdrive'},
                        properties={'AUC': best_run_metrics['AUC'], 'Accuracy': best_run_metrics['Accuracy']})

# List registered models
for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

Question 52

Q

Automated Machine Learning

Answer

A

There are many kinds of machine learning algorithm that you can use to train a model, and sometimes it’s not easy to determine the most effective algorithm for your particular data and prediction requirements.

Additionally, you can significantly affect the predictive performance of a model by preprocessing the training data, using techniques such as normalization, missing feature imputation, and others. In your quest to find the best model for your requirements, you may need to try many combinations of algorithms and preprocessing transformations; which takes a lot of time and compute resources.

Azure Machine Learning enables you to automate the comparison of models trained using different algorithms and preprocessing options. You can use the visual interface in Azure Machine Learning studio or the SDK to leverage this capability. he SDK gives you greater control over the settings for the automated machine learning experiment, but the visual interface is easier to use. In this lab, you’ll explore automated machine learning using the SDK.

Question 53

Q

Automated Machine Learning (SDK)

Answer

A

You don’t need to create a training script for automated machine learning, but you do need to

create the the training and test data (split) and save to a datastore
Setup a Compute
Configure the Auto ML Experiment

4 Run an Automated Machine Learning Experiment

Get the best model
Register the best model

Question 54

Q

Example Auto ML Experiment

Answer

A

To configure the automated machine learning experiment, you’ll need a run configuration that includes the required packages for the experiment environment, and a set of configuration settings that specifies how many combinations to try, which metric to use when evaluating models, and so on.

from azureml.train.automl import AutoMLConfig

automl_config = AutoMLConfig(name=’Automated ML Experiment’,
task=’classification’,
compute_target=training_cluster,
training_data = train_ds,
validation_data = test_ds,
label_column_name=’Diabetic’,
iterations=6,
primary_metric = ‘AUC_weighted’,
max_concurrent_iterations=2,
featurization=’auto’
)

print(“Ready for Auto ML run.”)

Question 55

Q

Run an Automated Machine Learning Experiment

Answer

A

from azureml.core.experiment import Experiment
from azureml.widgets import RunDetails

print(‘Submitting Auto ML experiment…’)
automl_experiment = Experiment(ws, ‘diabetes_automl’)

automl_run = automl_experiment.submit(automl_config)

RunDetails(automl_run).show()

automl_run.wait_for_completion(show_output=True)

Question 56

Q

Interpreting Models

Answer

A

You can use Azure Machine Learning to interpret a model by using an explainer that quantifies the amount of influence each feature contribues to the predicted label.

There are many common explainers, each suitable for different kinds of modeling algorithm; but the basic approach to using them is the same.

Question 57

Q

Explainer

Answer

A

Quantifies the amount of influence each feature contribues to the predicted label. That is: How do the features in the data influence the prediction?

There are many kinds of explainer. In this example you’ll use a Tabular Explainer, which is a “black box” explainer that can be used to explain many kinds of model by invoking an appropriate SHAP model explainer.

Question 58

Q

Get an Explainer for our Model

Answer

A

Get a suitable explainer for the model from the Azure ML interpretability library

Question 59

Q

Get Global Feature Importance

Answer

A

The first thing to do is try to explain the model by evaluating the overall feature importance - in other words, quantifying the extent to which each feature influences the prediction based on the whole training dataset.

Output

Pregnancies : 0.2194762749294642
Age : 0.10575947971825919
BMI : 0.09306316543787874
SerumInsulin : 0.06734976452903166
PlasmaGlucose : 0.05007378902962012
TricepsThickness : 0.021124772576803175
DiastolicBloodPressure : 0.016574790766927222
DiabetesPedigree : 0.016206788169148716

Question 60

Q

Get Local Feature Importance

Answer

A

So you have an overall view, but what about explaining individual observations? Let’s generate local explanations for individual predictions, quantifying the extent to which each feature influenced the decision to predict each of the possible label values.

In this case, it’s a binary model, so there are two possible labels (non-diabetic and diabetic); and you can quantify the influence of each feature for each of these label values for individual observations in a dataset. You’ll just evaluate the first two cases in the test dataset.

Output

Support for not-diabetic
Observation 1
SerumInsulin : 0.36925304330130265
Age : 0.2390809685204034
TricepsThickness : 0.025815337535141827
BMI : 0.012977411808708952
DiabetesPedigree : 0.002921802522673878
DiastolicBloodPressure : -0.015906526133378316
PlasmaGlucose : -0.036300469029731476
Pregnancies : -0.26441299709655025
———-
Total: 0.3334285714285707 Prediction: not-diabetic

Support for diabetic
Observation 1
Pregnancies : 0.26441299709655014
PlasmaGlucose : 0.03630046902973156
DiastolicBloodPressure : 0.015906526133378347
DiabetesPedigree : -0.002921802522673868
BMI : -0.012977411808708974
TricepsThickness : -0.025815337535141855
Age : -0.23908096852040375
SerumInsulin : -0.369253043301303
———-
Total: -0.3334285714285714 Prediction: not-diabetic

Question 61

Q

Adding Explainability to Azure ML Models Training Experiments

Answer

A

You can generate explanations for models trained outside of Azure ML; but when you use experiments to train models in your Azure ML workspace, you can generate model explanations and log them.

Question 62

Q

Train and Explain a Model using an Experiment Example

Answer

A

%%writefile $experiment_folder/diabetes_training.py
# Import libraries
import pandas as pd
import numpy as np
import joblib
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve

# Import Azure ML run library
from azureml.core.run import Run

# Import libraries for model explanation
from azureml.contrib.interpret.explanation.explanation_client import ExplanationClient
from interpret.ext.blackbox import TabularExplainer

# Get the experiment run context
run = Run.get_context()

# load the diabetes dataset
print("Loading Data...")
data = pd.read_csv('diabetes.csv')

features = ['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']
labels = ['not-diabetic', 'diabetic']

# Separate features and labels
X, y = data[features].values, data['Diabetic'].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# Train a decision tree model
print('Training a decision tree model')
model = DecisionTreeClassifier().fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
run.log('Accuracy', np.float(acc))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
run.log('AUC', np.float(auc))

os.makedirs('outputs', exist_ok=True)
# note file saved in the outputs folder is automatically uploaded into experiment record
joblib.dump(value=model, filename='outputs/diabetes.pkl')

# Get explanation
explainer = TabularExplainer(model, X_train, features=features, classes=labels)
explanation = explainer.explain_global(X_test)

# Get an Explanation Client and upload the explanation
explain_client = ExplanationClient.from_run(run)
explain_client.upload_model_explanation(explanation, comment='Tabular Explanation')

# Complete the run
run.complete()

Question 63

Q

Monitoring a Model - Enable Application Insights

Answer

A

When you’ve deployed a model into production as a service, you’ll want to monitor it to track usage and explore the requests it processes.

# Enable AppInsights
aci_service.update(enable_app_insights=True)
print(aci_service.state)
print('AppInsights enabled!')

Question 64

Q

Monitoring Data Drift

Answer

A

Install the DataDriftDetector module
Create a Baseline Dataset
Create a Target Dataset
Create a Data Drift Monitor
Backfill the Monitor
Analyze Data Drift