Path4.Mod2.a - Training Models with Scripts - Track Model Training with Jobs using MLFlow Flashcards
Two options to track ML Jobs with MLFlow
Option 1: mlflow.autolog()
- Enable autologging:
Option 2: mlflow.log_param/metric/artifact/etc...*
- Use individual logging functions for specific libraries to track custom metrics per library
mlf amlf
Two libraries you’ll need to (pip) install on your Compute for tracking with MLFlow
mlflow
and azureml-mlflow
:
name: mlflow-env channels: - conda-forge dependencies: - python=3.10 - pip - pip: - numpy - pandas - scikit-learn - matplotlib - mlflow - azureml-mlflow
When and where to enable MLFlow Autologgging
,
At the beginning of your training script, before training code:
Sav Sup
Two advantages of using MLFlow Autologging
- Saves you time and effort wrt logging important Model information, so you can track your model’s performance over time
- Supported by the majority of common libraries: scikit-learn, Tensorflow, LightGBM, Spark, XGBoost, PyTorch, etc.
P => l _ p, M => l _ m, MA => l _ a
The three things MLFlow allows you to log, along with their corresponding Custom logging functions that fulfill the majority of logging use cases:
* Inputs vs Outputs
* Which keeps track of value history
-
Parameters get logged with
mlflow.log_param()
: kv-pair for a single param, for inputs -
Metrics get logged with
mlflow.log_metric()
: kv-pair for a single numeric metric, for output. MLFlow will remember the value history for each metric for tracking purposes -
Model Artifacts get logged with
mlflow.log_artifact()
: Logs a file. Can use to save plots of the log as a image.
Ov Me Im Ou+L
Where to view metrics again…
In a completed Job/Experiement details:
- Details tab - overall view with all logged params under Params
- Metrics tab - numeric metrics specific to the ML type you selected
- Images tab - plots and other metric charting
- Outputs+Logs tab - additional artifacts like model files are stored in the Model folder