Path4.Mod2.c - Training Models with Scripts - Code to support Experiment Tracking with Jobs using MLFlow Flashcards

1
Q

single comp re-re coll

Benefits of tracking Experiments

A
  • All ML experiments organized in a single place (search and filter Experiments)
  • Compare Experiments, analyze results, debug model w/ little work
  • Repro or re-run experiments to validate results
  • Improve collaboration (sharing results, access Experiment data programmatically)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Main benefit when using MLFlow for Tracking wrt Azure ML Workspaces

A

Compatibility with Azure ML Workspaces lets you track runs, metrics, params and artifiacts directly from workspaces (in your Python code, in your Jupyter Notebooks, and ultimately in your production scripts).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Pip MLW URI

General prerequisites for using MLFlow
-Two ways to get the URI
- Set the URI
- The mlflow-skinny package use case

A
  • Install the mlflow sdk and the Azure ML plugin for MLFlow (pip install mflow azurml-mlflow)
  • An Azure Machine Learning Workspace
  • For remote tracking, configure MLFlow to point to your Azure ML Workspace’s tracking URI
    – To get the tracking URI:
    — SDK Python code: uri = ml_client.workspaces.get(ml_client.workspace_name).mlflow_tracking_uri
    — CLI: az ml workspace show --query mlflow_tracking_uri
    – To set the tracking URI: mlflow.set_tracking_uri(uri)

mlflow-skinny package* when you only need tracking and logging capabilities*.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

se en

Configure Experiment name for Notebooks and for Jobs

A
  • Notebooks: use exp = mlflow.set_experiment(name)
  • Jobs through CLI or SDK: in the job yaml set the experiment_name property
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

sr er

Configure Runs (MLFlow terminology for “tracked training jobs”) for Notebooks to start/stop explicitly, and the sigificance wrt when Tracking starts

A

… training code

To start and end explicitly:

mlflow.start_run()
# ... training code
mlflow.end_run()

Use a Context Manager (aka using in C#):

with mlflow.start_run() as run:
    # ... training code

You can also name the run:

with mlflow.start_run(run_name="my_run") as run:
    # ... training code

Wrt Tracking: Tracking doesn’t start until your code tries to log something.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

For configuring Runs (MLFlow terminology for “tracked training jobs”) via Command Job
- Three Training Code tasks
- Three MLOps tasks

A

Training Code:
- Give the Command Job a display_name
- Ensure training code is not using mlflow.start_run(run_name="")
- Add any tracking/logging code using the MLFlow SDK

MLOps:
- Put your training code (.py with a main entry point) in asrc folder
- Ensure your conda.yml installs MLFlow and azureml-MLFlow
- Submit the job

See here for details

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

gr, gmh, a.da

The function to use for accessing or querying metrics though the MLFlow SDK:
- For a single run (how we access the data)
- For all values of a given metric (why this is important)
- For logged artifacts like files and models (what params it needs)

A
  • We use mlflow.get_run(), then access its data object:
import mlflow

run = mlflow.get_run(run_id)
metrics = run.data.metrics
params = run.data.params
tags = run.data.tags

print(metrics, params, tags)
  • The above only returns the last value of the metric. To get the metric’s historical values, use mlflow.get_metric_history(run_id, metric_name):
  • To get artifacts: mlflow.artifacts.download_artifacts(run_id, artifact_path)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly