Azure ML SDK Flashcards

Question

random sampling in hyperdrive

Answer 1

Random sampling is used to randomly select a value for each hyperparameter, which can be a mix of discrete and continuous values as shown in the following code example.

Answer 2

Bayesian sampling chooses hyperparameter values based on the Bayesian optimization algorithm, which tries to select parameter combinations that will result in improved performance from the previous selection. Both discrete and continuous variables are possible.

Answer 3

You can use a bandit policy to stop a run if the target performance metric underperforms the best run so far by a specified margin. azureml.train.hyperdrive.BanditPolicy(slack_amount = 0.2, evaluation_interval=1, delay_evaluation=5)

Answer 4

A median stopping policy abandons runs where the target performance metric is worse than the median of the running averages for all runs. azureml.train.hyperdrive.MedianStoppingPolicy(evaluation_interval=1, delay_evaluation=5)

Answer 5

A truncation selection policy cancels the lowest performing X% of runs at each evaluation interval based on the truncation_percentage value you specify for X. azureml.train.hyperdrive.TruncationSelectionPolicy(truncation_percentage=10, evaluation_interval=1, delay_evaluation=5)

Answer 6

To run a hyperdrive experiment, you need to create a training script just the way you would do for any other training experiment, except that your script must: - Include an argument for each hyperparameter you want to vary. - Log the target performance metric. This enables the hyperdrive run to evaluate the performance of the child runs it initiates, and identify the one that produces the best performing model. For example, the following example script trains a logistic regression model using a --regularization argument to set the regularization rate hyperparameter, and logs the accuracy metric with the name Accuracy.

Answer 7

for child_run in hyperdrive_run.get_children_sorted_by_primary_metric(): print(child_run)

Answer 8

best_run = hyperdrive_run.get_best_run_by_primary_metric()

Answer 9

from azureml.contrib.interpret.explanation.explanation_client import ExplanationClient client = ExplanationClient.from_run_id(workspace=ws, experiment_name=experiment.experiment_name, run_id=run.id) explanation = client.download_model_explanation() feature_importances = explanation.get_feature_importance_dict()

Answer 10

An explainer that creates a global surrogate model that approximates your trained model and can be used to generate explanations. This explainable model must have the same kind of architecture as your trained model (for example, linear or tree-based).

Answer 11

TabularExplainer - An explainer that acts as a wrapper around various SHAP explainer algorithms, automatically choosing the one that is most appropriate for your model architecture.

Answer 12

PFIExplainer - a Permutation Feature Importance explainer that analyzes feature importance by shuffling feature values and measuring the impact on prediction performance.

Answer 13

explainer.explain_global(X_train).get_feature_importance_dict()

Answer 14

To retrieve local feature importance from a MimicExplainer or a TabularExplainer, you must call the explain_local() method of your explainer, specifying the subset of cases you want to explain. Then you can use the get_ranked_local_names() and get_ranked_local_values() methods to retrieve dictionaries of the feature names and importance values, ranked by importance.

Answer 15

Return a dictionary of explanation metadata such as id, data type, explanation method, model type, and upload time, sorted by upload time.

Answer 16

To schedule a pipeline to run at periodic intervals, you must define a ScheduleRecurrence that determines the run frequency, and use it to create a Schedule. from azureml.pipeline.core import ScheduleRecurrence, Schedule daily = ScheduleRecurrence(frequency='Day', interval=1) Schedule.create(..., recurrence=daily)

Answer 17

To schedule a pipeline to run whenever data changes, you must create a Schedule that monitors a specified path on a datastore. from azureml.pipeline.core import Schedule Schedule.create(..., datastore=training_datastore, path_on_datastore='data/training')

Answer 18

After you have published a pipeline, you can initiate it on demand through its REST endpoint, or you can have the pipeline run automatically based on a periodic schedule or in response to data updates.

Answer 19

1. Register a model 2. Create a scoring script - init(): Called when the pipeline is initialized. - run(mini_batch): Called for each batch of data to be processed. 3. Create a pipeline with a ParallelRunStep 4. Publish using run.publish_pipeline

Answer 20

1. Register a trained model 2. Define an inference configuration - A script to load the model and return predictions for submitted data. - An environment in which the script will be run. Combine these in an azureml.core.model.InferenceConfig 3. Define a deployment configuration 4. Deploy model using azureml.core.model.Model.deploy()

Answer 21

Now that you have the entry script and environment, you need to configure the compute to which the service will be deployed. If you are deploying to an AKS cluster, you must create the cluster and a compute target for it before deploying: from azureml.core.compute import ComputeTarget, AksCompute cluster_name = 'aks-cluster' compute_config = AksCompute.provisioning_configuration(location='eastus') production_cluster = ComputeTarget.create(ws, cluster_name, compute_config) production_cluster.wait_for_completion(show_output=True) With the compute target created, you can now define the deployment configuration, which sets the target-specific compute specification for the containerized deployment: from azureml.core.webservice import AksWebservice classifier_deploy_config = AksWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)

Answer 22

The ParallelRunConfig class is used to specify configuration for the ParallelRunStep class. The ParallelRunConfig and ParallelRunStep classes together can be used for any kind of processing job that involves large amounts of data and is not time-sensitive, such as training or scoring. The ParallelRunStep works by breaking up a large job into batches that are processed in parallel. The batch size and degree of parallel processing can be controlled with the ParallelRunConfig class. ParallelRunStep can work with either TabularDataset or FileDataset as input. To work with the ParallelRunStep class the following pattern is typical: Create a ParallelRunConfig object to specify how batch processing is performed, with parameters to control batch size, number of nodes per compute target, and a reference to your custom Python script. Create a ParallelRunStep object that uses the ParallelRunConfig object, defines inputs and outputs for the step, and list of models to use. Use the configured ParallelRunStep object in a Pipeline just as you would with pipeline step types defined in the steps package.

Answer 23

in ParallelRunConfig if output_action == 'append_row': All values output by run() method invocations will be aggregated into one unique file named parallel_run_step.txt that is created in the output location.

Answer 24

User script is expected to store the output by itself. An output row is still expected for each successful input item processed. The system uses this output only for error threshold calculation (ignoring the actual value of the row).

Answer 25

Return the status details of the run with log file contents. (dict)

Answer 26

for i in tqdm(range(-10, 10)): run.log(name='Sigmoid', value=1 / (1 + np.exp(-i))) angle = i / 2.0 Value in portal: Single-variable line chart

Answer 27

run.log_list(name='Fibonacci', value=[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]) Value in portal: single-variable line chart

Answer 28

run.log_row(name='Cosine Wave', angle=angle, cos=np.cos(angle)) sines['angle'].append(angle) sines['sine'].append(np.sin(angle)) Value in portal: Two-variable line chart

Answer 29

run.log_table(name='Sine Wave', value=sines) Value in portal: Two-variable line chart

Answer 30

run.log_image(name='food', path='./breadpudding.jpg', plot=None, description='desert') Use this method to log an image file or a matplotlib plot to the run. These images will be visible and comparable in the run record

Answer 31

Wait for the completion of this pipeline run. Returns the status after the wait.

Azure ML SDK Flashcards

(55 cards)