Path6.Mod1.a - Deploy and Consume Models - Managed Online Endpoints Flashcards
Real-Time Endpoints -> Inferencing
An HTTPS endpoint for which you send a request, the input from that request gets sent to a scoring script that loads a trained model. The model then infers against that data: uses this new input to predict a label for that data.
MOE KOE
Two types of Online Endpoints, when each is preferred
- Managed Online Endpoints: Azure ML manages all underlying infrastructure. Preferred these when testing. You only need specify the VM size and scale settings. All else is managed for you by Azure.
- Kubernetes Online Endpoints: Users manage K8s clusters that provide the infrastructure. Preferred for control and stability, though likely managed by other teams (DevOps). Setup and scaling managed by K8s admins.
MA SS Env ComConf
Four things required for Model Deployment
- Model Assets: ex. the registered Model in the ML Workspace
- Scoring Script: the script that loads the model
- Environment: lists all necesary packages needed for the Endpoint’s Compute installation
- Compute Configuration: includes compute size and scaling settings, correspondig to request throughput needs
The easiest way to get a Model deployed to an Online Endpoint requires two things and what it does automatically for you.
Deploy as an MLFlow Model to a Managed Online Endpoint. Both the scoring script and Environment are automatically generated!
Reduce costs with Managed Online Endpoint deployments using these kind of Models…
Multi-Models/Multi-Model Deployments (register a Model folder that contains all the Models you want to deploy as files or subdirectories containing them)
Blue/Green Deployments, the diff and general process
An approach for endpoints with multiple deployments. Allows you to test new versions of a model on the same endpoint, and switch to new versions without service interruptions.
- Blue Deployments use the currently deployed production Model on the Endpoint.
- Green Deployments use an updated Model trained on new data using the same Endpoint
Production traffic is then split between the two, where for example, 90% go Blue and the other 10% go Green.
- If the new Model sucks, just roll back on Green
- Else redirect all Blue traffic to the Green Model
The ManagedOnlineEndpoint
class, five specific parameters and:
- which two are required
- which must be unique to the Azure Region
SDK entity to create online Endpoints
- name
: must be unique in the Azure Region. Required
- auth_mode
: use key
(key-based auth) or aml_token
(Azure ML token-based auth). Required
- model
: the Model you’re deploying to the Endpoint
- instance_type
: VM size to use
- instance_count
: Number of instances to use
OofQ RNR
The Caution to keep in mind w.r.t Online Endpoint SKUs
The two errors to look out for and how to remedy
Standard_DS1_v2
and Standard_F2s_v2
may be too small for larger models and may lead to Container termination due to insufficient memory, not enough disk space or probe failure as it takes to long to initiate the container.
If you see OutOfQuota
or ResourceNotReady
errors, try bigger VM SKUs (though this increases cost…)
Give example code for
- Deploying a ManagedOnlineEndpoint
with an MLFlow Model
- Setting the endpoint to receive 100% of traffic
- Deleting the endpoint by name
Put it all together:
from azure.ai.ml.entities import ManagedOnlineEndpoint from azure.ai.ml.constants import AssetTypes model = Model( path="./model", type=AssetTypes.MLFLOW_MODEL, description="my MLFlow model" ) endpoint = ManagedOnlineEndpoint( name="endpoint-example", description="Online endpoint", auth_mode="key", model=model, instance_type="Standard_F4s_v2", instance_count=1 ) endpoint.traffic = { "endpoint-example" : 100 } ml_client.begin_create_or_update(endpoint).result() // or ml_client.online_deployments.begin_create_or_update(blue_deployment).result() // ... ml_client.online_endpoints.begin_delete(name="endpoint-example")