Run Experiments & Train Models Flashcards
What are the 5 Areas of Algorithmns?
Clustering
Anomaly Detection
Regression
Multi-Class Classification
Two-Class Classification
Algorithm Cheat Sheet:
Anomaly Detection
( when to use )
( One-Class SVM & PCA-Based Anomaly Detection )
Anomaly Detection: find unusual data points
One-Class SVM: >100 features, aggressive boundary
PCA-Based Anomaly Detection: fast training
Algorithm Cheet Sheet:
Clustering
( when to use )
Clustering: identify data structure
Algorithm Cheat Sheet:
Multi-Class Classification
( when to use )
Multiclass Logistic Regression
Multiclass Neural Network
Multiclass Decision Forest
Multiclass Decision Jungle
One-V-All Multiclass
Multi-Class Classification: 3+ classification
Multiclass Logistic Regression: fast-training linear models
Multiclass Neural Network: accuracy, long training times
Multiclass Decision Forest: accuracy, fast training
Multiclass Decision Jungle: accuracy, small memory footprint
One-V-All Multiclass: utilized with the two class classifier
Algorithm Cheat Sheet:
Two-Class Classification
( when to use )
Two-Class SVM
Two-Class Locally Deep SVM
Two-Class Average Preceptron
Two-Class Logistic Regression
Two-Class Bayes Point Machine
Two-Class Decision Forest
Two-Class Boosted Decision Tree
Two-Class Decision Jungle
Two-Class Neural Network
- Two-Class Classification: two predicting categories
- Two-Class SVM: >100 features, linear model
- Two-Class Locally Deep SVM: >100 features
- Two-Class Average Preceptron: fast training, linear model
- Two-Class Logistic Regression: fast training, linear model
- Two-Class Bayes Point Machine: fast training, linear model
- Two-Class Decision Forest: accuracy, fast training
- Two-Class Boosted Decision Tree: accuracy, fast training, large memory footprint
- Two-Class Decision Jungle: accuracy, small memory footprint
- Two-Class Neural Network: accuracy, long training time
Algorithm Cheat Sheet:
Regression
(when to use)
Ordinal Regression
Poisson Regression
Fast Forest Quantile Regression
Linear Regression
Bayesian Linear Regression
Neural Network Regression
Decision Forest Regression
Boosted Decision Tree Regression
- Regression: quantitative prediction
- Ordinal Regression: data in rank ordered categories
- Poisson Regression: event counts
- Fast Forest Quantile Regression: predicting a distribution
- Linear Regression: fast training, linear model
- Bayesian Linear Regression: linear model, small data set
- Neural Network Regression: accuracy, long training time
- Decision Forest Regression: accuracy, fast training
- Boosted Decision Tree Regression: accuracy, fast training, large memory footprint
What is a Pipeline?
Automated workflow of the machine learning steps
data processing to deployment
data processing > build & train > deploy and monitor
Image Classification:
Set Up Your Development Environment Steps
1. import packages
- will depend on the analysis in questions
- will always needs azureml.core
2. connect to worksapce
- will always need to import Workspace
- will always need the .from_config( ) method
3. create an experiement to track runs
- need to import the Experiment package
4. create a remote compute target
- bring in AmlCompute (create single or multi-node computes) and ComputeTarget (host for deployment)
Stratified Random Sampling
division of populations into smaller sub-groups based on a shared characteristic (location, population, income, eduction, etc.)
by spliting the data this way you get proportional representations of your train and test split
5% of populations income > $125K, the train split should have 5% > $125K and the trest split should have 5% > $125K
SDK: Experiment
function name?
function to run and experiment?
function to run experiment from ScriptRunConfig( )?
log metrics?
Experiment Function
Experiment(workspace, name)
Run and Experiment
name_of_experiment.start_logging()
Run Experiment from ScriptRunConfig
Run.get_context( )
Log Metrics
name_of_experiment.log(‘metric name’, metric value)
not log_metric() which is log metric for mlFlow
End Run
name_of_experiment.complete()
SDK: What class is used to create a script configuration?
what is it used for?
how to submit the class?
script_config = ScriptRunConfig(source_directory, script)
- it prvides all the configuration information for your script to run (package, dependencies, etc.)
- similar to a container used to run model on different machines
- transfers a local file to the Azure ML environment
run_name = experiment_name.submit(config=script_config)
new_run.wait_for_completion( )
ensures script does not finish locally
SDK: .get_context( )
retrieve and access the current run
used to log and run experiments when using the ScriptRunConfig( ) class
SDK: Environment
what’s the function?
add dependencies from conda (2 steps)?
Function
Environment()
Bring in Packages
CondaDependencies.create( conda_packages = [] )
env_name.python.conda_dependencies = env_name
SDK: Compute Cluster
package used?
how to provision?
how to create?
package used
from azureml.core.compute import AmlCompute
provisioning method
AmlCompute.provisioning_configuration( )
creation method
AmlCompute.create()
SDK: Automate Model Training AzureML
configuration function?
assign cluster function?
pipeline data function?
pipeline step function?
build pipeline function?
experiment & run functions?
Build Pipeline Preliminary Steps
- access workspace
- create environment
- create compute
Build Pipeline Additional Steps
- run configuration
- assign compute cluster
- create data transfer folder
- define pipeline steps (data prep & train model)
- build pipeline
- create/access experiment + run the pipeline
Step One: Configuration Function
from azureml.core.runconfig import RunConfiguration
RunConfiguration( )
Step Two: Assign Cluster
config_name.target = name_of_compute_cluster
config_name.environment = env_name
Step Three: Pipeline Data Function
from azureml.pipeline.core import PipelineData
PipelineData( )
Step Four: Define Pipeline Steps Function
from azureml.pipeline.steps import PythonScriptStep
PythonScriptStep( )
use twice of data prep & training the model
Step Five: Build Pipeline Function
Pipeline( )
Step Five: Experiment Function & Submit
Experiment( )
experiment_name.submit( )