Run Experiments & Train Models Flashcards

Question 1

Q

What are the 5 Areas of Algorithmns?

Answer

A

Clustering

Anomaly Detection

Regression

Multi-Class Classification

Two-Class Classification

Question 2

Q

Algorithm Cheat Sheet:

Anomaly Detection

( when to use )

( One-Class SVM & PCA-Based Anomaly Detection )

Answer

A

Anomaly Detection: find unusual data points

One-Class SVM: >100 features, aggressive boundary

PCA-Based Anomaly Detection: fast training

Question 3

Q

Algorithm Cheet Sheet:

Clustering

( when to use )

Answer

A

Clustering: identify data structure

Question 4

Q

Algorithm Cheat Sheet:

Multi-Class Classification

( when to use )

Multiclass Logistic Regression

Multiclass Neural Network

Multiclass Decision Forest

Multiclass Decision Jungle

One-V-All Multiclass

Answer

A

Multi-Class Classification: 3+ classification

Multiclass Logistic Regression: fast-training linear models

Multiclass Neural Network: accuracy, long training times

Multiclass Decision Forest: accuracy, fast training

Multiclass Decision Jungle: accuracy, small memory footprint

One-V-All Multiclass: utilized with the two class classifier

Question 5

Q

Algorithm Cheat Sheet:

Two-Class Classification

( when to use )

Two-Class SVM

Two-Class Locally Deep SVM

Two-Class Average Preceptron

Two-Class Logistic Regression

Two-Class Bayes Point Machine

Two-Class Decision Forest

Two-Class Boosted Decision Tree

Two-Class Decision Jungle

Two-Class Neural Network

Answer

A

Two-Class Classification: two predicting categories
Two-Class SVM: >100 features, linear model
Two-Class Locally Deep SVM: >100 features
Two-Class Average Preceptron: fast training, linear model
Two-Class Logistic Regression: fast training, linear model
Two-Class Bayes Point Machine: fast training, linear model
Two-Class Decision Forest: accuracy, fast training
Two-Class Boosted Decision Tree: accuracy, fast training, large memory footprint
Two-Class Decision Jungle: accuracy, small memory footprint
Two-Class Neural Network: accuracy, long training time

Question 6

Q

Algorithm Cheat Sheet:

Regression

(when to use)

Ordinal Regression

Poisson Regression

Fast Forest Quantile Regression

Linear Regression

Bayesian Linear Regression

Neural Network Regression

Decision Forest Regression

Boosted Decision Tree Regression

Answer

A

Regression: quantitative prediction
Ordinal Regression: data in rank ordered categories
Poisson Regression: event counts
Fast Forest Quantile Regression: predicting a distribution
Linear Regression: fast training, linear model
Bayesian Linear Regression: linear model, small data set
Neural Network Regression: accuracy, long training time
Decision Forest Regression: accuracy, fast training
Boosted Decision Tree Regression: accuracy, fast training, large memory footprint

Question 7

Q

What is a Pipeline?

Answer

A

Automated workflow of the machine learning steps

data processing to deployment

data processing > build & train > deploy and monitor

Question 8

Q

Image Classification:

Set Up Your Development Environment Steps

Answer

A

1. import packages

will depend on the analysis in questions
will always needs azureml.core

2. connect to worksapce

will always need to import Workspace
will always need the .from_config( ) method

3. create an experiement to track runs

need to import the Experiment package

4. create a remote compute target

bring in AmlCompute (create single or multi-node computes) and ComputeTarget (host for deployment)

Question 9

Q

Stratified Random Sampling

Answer

A

division of populations into smaller sub-groups based on a shared characteristic (location, population, income, eduction, etc.)

by spliting the data this way you get proportional representations of your train and test split

5% of populations income > $125K, the train split should have 5% > $125K and the trest split should have 5% > $125K

Question 10

Q

SDK: Experiment

function name?

function to run and experiment?

function to run experiment from ScriptRunConfig( )?

log metrics?

Answer

A

Experiment Function

Experiment(workspace, name)

Run and Experiment

name_of_experiment.start_logging()

Run Experiment from ScriptRunConfig

Run.get_context( )

Log Metrics

name_of_experiment.log(‘metric name’, metric value)

not log_metric() which is log metric for mlFlow

End Run

name_of_experiment.complete()

Question 11

Q

SDK: What class is used to create a script configuration?

what is it used for?

how to submit the class?

Answer

A

script_config = ScriptRunConfig(source_directory, script)

it prvides all the configuration information for your script to run (package, dependencies, etc.)
similar to a container used to run model on different machines
transfers a local file to the Azure ML environment

run_name = experiment_name.submit(config=script_config)

new_run.wait_for_completion( )

ensures script does not finish locally

Question 12

Q

SDK: .get_context( )

Answer

A

retrieve and access the current run

used to log and run experiments when using the ScriptRunConfig( ) class

Question 13

Q

SDK: Environment

what’s the function?

add dependencies from conda (2 steps)?

Answer

A

Function

Environment()

Bring in Packages

CondaDependencies.create( conda_packages = [] )

env_name.python.conda_dependencies = env_name

Question 14

Q

SDK: Compute Cluster

package used?

how to provision?

how to create?

Answer

A

package used

from azureml.core.compute import AmlCompute

provisioning method

AmlCompute.provisioning_configuration( )

creation method

AmlCompute.create()

Question 15

Q

SDK: Automate Model Training AzureML

configuration function?

assign cluster function?

pipeline data function?

pipeline step function?

build pipeline function?

experiment & run functions?

Answer

A

Build Pipeline Preliminary Steps

access workspace
create environment
create compute

Build Pipeline Additional Steps

run configuration
assign compute cluster
create data transfer folder
define pipeline steps (data prep & train model)
build pipeline
create/access experiment + run the pipeline

Step One: Configuration Function

from azureml.core.runconfig import RunConfiguration

RunConfiguration( )

Step Two: Assign Cluster

config_name.target = name_of_compute_cluster

config_name.environment = env_name

Step Three: Pipeline Data Function

from azureml.pipeline.core import PipelineData

PipelineData( )

Step Four: Define Pipeline Steps Function

from azureml.pipeline.steps import PythonScriptStep

PythonScriptStep( )

use twice of data prep & training the model

Step Five: Build Pipeline Function

Pipeline( )

Step Five: Experiment Function & Submit

Experiment( )

experiment_name.submit( )

Question 16

Q

Command Line Arguments

Answer

Study These Flashcards

A

Run Experiments & Train Models Flashcards

(16 cards)