Path4.Mod2.c - Training Models with Scripts - Code to support Experiment Tracking with Jobs using MLFlow Flashcards
single comp re-re coll
Benefits of tracking Experiments
- All ML experiments organized in a single place (search and filter Experiments)
- Compare Experiments, analyze results, debug model w/ little work
- Repro or re-run experiments to validate results
- Improve collaboration (sharing results, access Experiment data programmatically)
Main benefit when using MLFlow for Tracking wrt Azure ML Workspaces
Compatibility with Azure ML Workspaces lets you track runs, metrics, params and artifiacts directly from workspaces (in your Python code, in your Jupyter Notebooks, and ultimately in your production scripts).
Pip MLW URI
General prerequisites for using MLFlow
-Two ways to get the URI
- Set the URI
- The mlflow-skinny
package use case
- Install the mlflow sdk and the Azure ML plugin for MLFlow (
pip install mflow azurml-mlflow
) - An Azure Machine Learning Workspace
- For remote tracking, configure MLFlow to point to your Azure ML Workspace’s tracking URI
– To get the tracking URI:
— SDK Python code:uri = ml_client.workspaces.get(ml_client.workspace_name).mlflow_tracking_uri
— CLI:az ml workspace show --query mlflow_tracking_uri
– To set the tracking URI:mlflow.set_tracking_uri(uri)
mlflow-skinny
package* when you only need tracking and logging capabilities*.
se en
Configure Experiment name for Notebooks and for Jobs
- Notebooks: use
exp = mlflow.set_experiment(name)
- Jobs through CLI or SDK: in the job yaml set the
experiment_name
property
sr er
Configure Runs (MLFlow terminology for “tracked training jobs”) for Notebooks to start/stop explicitly, and the sigificance wrt when Tracking starts
… training code
To start and end explicitly:
mlflow.start_run() # ... training code mlflow.end_run()
Use a Context Manager (aka using
in C#):
with mlflow.start_run() as run: # ... training code
You can also name the run:
with mlflow.start_run(run_name="my_run") as run: # ... training code
Wrt Tracking: Tracking doesn’t start until your code tries to log something.
For configuring Runs (MLFlow terminology for “tracked training jobs”) via Command Job
- Three Training Code tasks
- Three MLOps tasks
Training Code:
- Give the Command Job a display_name
- Ensure training code is not using mlflow.start_run(run_name="")
- Add any tracking/logging code using the MLFlow SDK
MLOps:
- Put your training code (.py with a main
entry point) in asrc
folder
- Ensure your conda.yml
installs MLFlow and azureml-MLFlow
- Submit the job
See here for details
gr, gmh, a.da
The function to use for accessing or querying metrics though the MLFlow SDK:
- For a single run (how we access the data)
- For all values of a given metric (why this is important)
- For logged artifacts like files and models (what params it needs)
- We use mlflow.get_run(), then access its data object:
import mlflow run = mlflow.get_run(run_id) metrics = run.data.metrics params = run.data.params tags = run.data.tags print(metrics, params, tags)
- The above only returns the last value of the metric. To get the metric’s historical values, use
mlflow.get_metric_history(run_id, metric_name)
: - To get artifacts:
mlflow.artifacts.download_artifacts(run_id, artifact_path)