Path1.Mod1.f - Explore ML Workspace - MLModel Format Flashcards
Difference between Artifacts and Models in MLflow
MLModel Format usage wrt the Model
What’s this code doing?
import mlflow mlflow.sklearn.log_model(sklearn_estimator, "classifier")
Any file generated and captured from an experiment’s run or job is an Artifact.
Models are a certain type of Artifact; we use the MLModel Format for loading and to communicate Model intention.
Example code for logging a model in MLFlow, using a specific Flavor (sklearn)
con/met, man Fl Si
MFlow’s MLModel Format: what it is and where it stores assets
The MLmodel File: what it is and the two sections it uses to describe the model’s usage
MLModel Format
* It’s a contract defining Artifacts and what they represent (like metadata)
* The format stores assets in a folder; one of those assets is named MLmodel
MLmodel File
* It’s the model manifest describing how the model is loaded and used
- Specifies two sections: Flavors and Signatures
unicont
Model Flavors:
- what they are
- how they enforce a serialization mechanism for peristing and loading models
The Flavor is the unique contract in MLflow designed to work across all ML frameworks, that indicates what to expect for a given model created with a specific framework (how to persist and load models) i.e. a specific “flavor” of ML Framework.
There is no enforcement of a single serialization mechanism that all models need to support. That decision is left to each flavor, to specify based on each framework’s best practices.
Pact meth inf
Model Signatures:
- what they are
- what two subsections they specify
- how MLflow enforces their types
- Signatures (ie the API) are the data contract between the model and the server running your models
- Signatures aka method signatures specify two subsections: inputs and outputs
- MLflow enforces Signatures if one is available during model inference process. This uses a best-effort approach. You can still log models manually if the inferred Signatures are not desired
c-b + FRAME, t-b + nd/dict
The two Signature Types and what objects/types are provided to support them
-
Column-based - Signatures that operate to tabular data (data organized in a Table), using
pandas.DataFrame
objects as input -
Tensor-based - Signatures that operate with n-dimensional arrays (aka tensors). MLflow supplies
numpy.ndarray
as inputs or adict[string, numpy.ndarray]
for named-tensors
con log dep
Model Environment:
- where they are defined
- two ways they are consumed,
- how they are different from Azure ML Environments
The Model Environment
- defined in the conda.yml
in the MLModel’s pipeline folder
- consumed when auto-detected by MLflow or manually indicated when calling mlflow.<flavor>.log_model()
.
- Azure ML Environments apply to Workspaces (for registered Environments) or to Jobs/Deployments (for annonymous Environments). MLflow model Environments are built and used for Model deployment.
NCDE!
Model Prediction (predict()) Functions:
- when they are called
- what they return
- All MLflow Models have a
predict
function, called when a Model is deployed using the no-code-deployment experience - What they return depends on the flavor
CP/P CP ME CL EH BL V MMD
When to customize Model Prediction (predict()) Functions
- Custom Pre/Postprocessing: For models requiring extra data manipulation steps.
- Complex Pipelines: To encapsulate multi-step data transformations and models.
- Model Ensembling: For managing multiple models used in tandem.
- Custom Logging: To capture additional metrics or features during prediction.
- Error Handling: For custom responses to prediction errors or data issues.
- Business Logic: To apply specific rules or adjustments to predictions.
- Versioning: To manage different versions of a model dynamically.
- Multi-Model Deployment: For routing requests to different models based on input.
R FS R
Models created as MLFlow Models can be loaded back into code from these three different locations
- From the run where they were logged
- From the file system they were saved on
- From the Model registry where they are registered
same inf
Two Workflows available for loading Models back:
- diff between flavor.load_model & pyfunc.load_model
- the workflow that guarantees a predict function will take all Signature types
-
Loading back the same object and types that were logged - using MLflow SDK (
mlflow.<flavor>.load_model()
) you can obtain an instance of the model with types specific to the training library -
Loading back a model for running inference - using MLflow SDK to obtain a wrapper that MLflow guarantees will have a
predict
function that can be called with parameter value typespandas.DataFrame
,numpy.ndarray
ordict[string, numpy.ndarray]
. Usemlflow.pyfunc.load_model()
to handle the type conversion to the expected input type
B R-T, Sw TF, PIn, RAI
Four Advantages for logging MLFlow models
Advantages:
* Deploy on batch endpoints or real-time endpoints without a scoring script or Environment
* Auto-generated Swagger and Test features post-deployment
* Models can immediately be used as pipeline inputs
* Access to the Responsible AI Dashboard
MLF C Tr
The three types of Models that can be registered in MLFlow
MLFlow - trained in MLFlow
Custom - types not supported by Azure ML
Triton - Deep learning workload models used by Tensorflow and PyTorch