Run Experiments & Train Models Flashcards

1
Q

What are the 5 Areas of Algorithmns?

A

Clustering

Anomaly Detection

Regression

Multi-Class Classification

Two-Class Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Algorithm Cheat Sheet:

Anomaly Detection

( when to use )

( One-Class SVM & PCA-Based Anomaly Detection )

A

Anomaly Detection: find unusual data points

One-Class SVM: >100 features, aggressive boundary

PCA-Based Anomaly Detection: fast training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Algorithm Cheet Sheet:

Clustering

( when to use )

A

Clustering: identify data structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Algorithm Cheat Sheet:

Multi-Class Classification

( when to use )

Multiclass Logistic Regression

Multiclass Neural Network

Multiclass Decision Forest

Multiclass Decision Jungle

One-V-All Multiclass

A

Multi-Class Classification: 3+ classification

Multiclass Logistic Regression: fast-training linear models

Multiclass Neural Network: accuracy, long training times

Multiclass Decision Forest: accuracy, fast training

Multiclass Decision Jungle: accuracy, small memory footprint

One-V-All Multiclass: utilized with the two class classifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Algorithm Cheat Sheet:

Two-Class Classification

( when to use )

Two-Class SVM

Two-Class Locally Deep SVM

Two-Class Average Preceptron

Two-Class Logistic Regression

Two-Class Bayes Point Machine

Two-Class Decision Forest

Two-Class Boosted Decision Tree

Two-Class Decision Jungle

Two-Class Neural Network

A
  1. Two-Class Classification: two predicting categories
  2. Two-Class SVM: >100 features, linear model
  3. Two-Class Locally Deep SVM: >100 features
  4. Two-Class Average Preceptron: fast training, linear model
  5. Two-Class Logistic Regression: fast training, linear model
  6. Two-Class Bayes Point Machine: fast training, linear model
  7. Two-Class Decision Forest: accuracy, fast training
  8. Two-Class Boosted Decision Tree: accuracy, fast training, large memory footprint
  9. Two-Class Decision Jungle: accuracy, small memory footprint
  10. Two-Class Neural Network: accuracy, long training time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Algorithm Cheat Sheet:

Regression

(when to use)

Ordinal Regression

Poisson Regression

Fast Forest Quantile Regression

Linear Regression

Bayesian Linear Regression

Neural Network Regression

Decision Forest Regression

Boosted Decision Tree Regression

A
  1. Regression: quantitative prediction
  2. Ordinal Regression: data in rank ordered categories
  3. Poisson Regression: event counts
  4. Fast Forest Quantile Regression: predicting a distribution
  5. Linear Regression: fast training, linear model
  6. Bayesian Linear Regression: linear model, small data set
  7. Neural Network Regression: accuracy, long training time
  8. Decision Forest Regression: accuracy, fast training
  9. Boosted Decision Tree Regression: accuracy, fast training, large memory footprint
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a Pipeline?

A

Automated workflow of the machine learning steps

data processing to deployment

data processing > build & train > deploy and monitor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Image Classification:

Set Up Your Development Environment Steps

A

1. import packages

  • will depend on the analysis in questions
  • will always needs azureml.core

2. connect to worksapce

  • will always need to import Workspace
  • will always need the .from_config( ) method

3. create an experiement to track runs

  • need to import the Experiment package

4. create a remote compute target

  • bring in AmlCompute (create single or multi-node computes) and ComputeTarget (host for deployment)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Stratified Random Sampling

A

division of populations into smaller sub-groups based on a shared characteristic (location, population, income, eduction, etc.)

by spliting the data this way you get proportional representations of your train and test split

5% of populations income > $125K, the train split should have 5% > $125K and the trest split should have 5% > $125K

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

SDK: Experiment

function name?

function to run and experiment?

function to run experiment from ScriptRunConfig( )?

log metrics?

A

Experiment Function

Experiment(workspace, name)

Run and Experiment

name_of_experiment.start_logging()

Run Experiment from ScriptRunConfig

Run.get_context( )

Log Metrics

name_of_experiment.log(‘metric name’, metric value)

not log_metric() which is log metric for mlFlow

End Run

name_of_experiment.complete()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

SDK: What class is used to create a script configuration?

what is it used for?

how to submit the class?

A

script_config = ScriptRunConfig(source_directory, script)

  • it prvides all the configuration information for your script to run (package, dependencies, etc.)
  • similar to a container used to run model on different machines
  • transfers a local file to the Azure ML environment

run_name = experiment_name.submit(config=script_config)

new_run.wait_for_completion( )

ensures script does not finish locally

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

SDK: .get_context( )

A

retrieve and access the current run

used to log and run experiments when using the ScriptRunConfig( ) class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

SDK: Environment

what’s the function?

add dependencies from conda (2 steps)?

A

Function

Environment()

Bring in Packages

CondaDependencies.create( conda_packages = [] )

env_name.python.conda_dependencies = env_name

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

SDK: Compute Cluster

package used?

how to provision?

how to create?

A

package used

from azureml.core.compute import AmlCompute

provisioning method

AmlCompute.provisioning_configuration( )

creation method

AmlCompute.create()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

SDK: Automate Model Training AzureML

configuration function?

assign cluster function?

pipeline data function?

pipeline step function?

build pipeline function?

experiment & run functions?

A

Build Pipeline Preliminary Steps

  1. access workspace
  2. create environment
  3. create compute

Build Pipeline Additional Steps

  1. run configuration
  2. assign compute cluster
  3. create data transfer folder
  4. define pipeline steps (data prep & train model)
  5. build pipeline
  6. create/access experiment + run the pipeline

Step One: Configuration Function

from azureml.core.runconfig import RunConfiguration

RunConfiguration( )

Step Two: Assign Cluster

config_name.target = name_of_compute_cluster

config_name.environment = env_name

Step Three: Pipeline Data Function

from azureml.pipeline.core import PipelineData

PipelineData( )

Step Four: Define Pipeline Steps Function

from azureml.pipeline.steps import PythonScriptStep

PythonScriptStep( )

use twice of data prep & training the model

Step Five: Build Pipeline Function

Pipeline( )

Step Five: Experiment Function & Submit

Experiment( )

experiment_name.submit( )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Command Line Arguments

A