Path6.Mod2.b - Deploy and Consume Models - Batch Endpoint Deployment w/out MLFlow Flashcards

Question 1

Q

When deploying without MLFlow, all aspects are generally the same as if deploying with MLFlow, except for this

Answer

A

You need to generate the Scoring Script and an Execution Environment.

Question 2

Q

i r

The three responsibilities of the Scoring Script when MLFlow isn’t being used

Answer

A

Load the Model
Reads new data
Performs scoring

Question 3

Q

i r

Two functions the Scoring Script must define and their respective use cases/responsibilities and return values

Answer

A

init(): Called once at the beginning of the script, use for costly or common prep like Model loading. Doesn’t return anything
run(): Responsible for reading new data. Called for each mini batch to perform scoring. Should return a pandas DataFrame or an arrayList

Question 4

Q

Given this code, note:

AZUREML_MODEL_DIR
global model
mini_batch and why size matters!
where the predictions are written to

import os
import mlflow
import pandas as pd

// 1. What is this doing?
def init():
    global model
    model_path = os.path.join(os.environ["AZUREML_MODEL_DIR"], "model")

    // 2. What is this doing?
    model = mlflow.pyfunc.load(model_path)

def run(mini_batch):
    print(f"run method start: {\_\_file\_\_}, run({len(mini_batch)} files)")
    resultList = []

    // 3. What’s happening here?
    for file_path in mini_batch:
        data = pd.read_csv(file_path)
        pred = model.predict(data)
       
        // 4. What’s happening here?
        df = pd.DataFrame(pred, columns=["predictions"])
        df["file"] = os.path.basename(file_path)
        resultList.extend(df.values)

    return resultList

Answer

A

A global model: making it global makes it available to all script code. Also, AZUREML_MODEL_DIR: env variable used to set to locate files associated with your Model.
Load the MLFlow model
And 4. mini_batch: the size is defined in deployment config. If files are too large to process, you need to split them into smaller ones. Load each file and predict. Predictions are all written to a single file via DataFrame.

Question 5

Q

NOT manually…

Two ways to create the Execution Environment

Answer

A

Dockerfile
Docker Image w/ Conda Dependencies (basic conda.yml file):

name: basic-env-cpu
channels:
  - conda-forge
dependencies:
  - python=3.8
  - pandas
  - pip
  - pip:
      - azureml-core
      - mlflow

Question 6

Q

Create an instance of the Environment class

Answer

A

from azure.ai.ml.entities import Environment

env = Environment(
    image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04",
    conda_file="./src/conda-env.yml",
    name="deployment-environment",
    description="Environment created from a Docker image plus Conda environment.",
)
ml_client.environments.create_or_update(env)

Question 7

Q

Create an instance of the BatchDeployment class (putting it all together)

Answer

A

from azure.ai.ml.entities import BatchDeployment, BatchRetrySettings
from azure.ai.ml.constants import BatchDeploymentOutputAction

deployment = BatchDeployment(
    name="forecast-mlflow",
    description="A sales forecaster",
    endpoint_name=endpoint.name,
    model=model, // The global model
    compute="aml-cluster",
    code_path="./code",
    scoring_script="score.py",
    environment=env,  // from earlier
    instance_count=2,
    max_concurrency_per_instance=2,
    mini_batch_size=2,
    output_action=
BatchDeploymentOutputAction.APPEND_ROW,
    output_file_name="predictions.csv",
retry_settings=BatchRetrySettings(max_retries=3, timeout=300),
    logging_level="info",
)
ml_client.batch_deployments.begin_create_or_update(deployment)

Question 8

Q

Using an AutomatedML Model with a Batch Endpoint

Answer

A

You can’t….The Scoring Script provided via AutomatedML only works with Online Endpoints; it’s not designed for Batch Endpoints.

Question 9

Q

How Batch Endpoints distribute work, how to handle larger files and what smaller batch files do for us

Answer

A

Batches are distributed in batches of files. So an input folder containing 100 files with 10 mini-batches will generate 10 batch child jobs of 10 files.

Again, larger files should be split up, or decrease the number of files per mini-batch to accomodate them.

Smaller files result in higher levels of parallelism.

Question 10

Q

run() function return types
- Which to use for what type of return data
- How they are returned w.r.t. output_action
- What not to output and why

Answer

A

For multiple points of information, return a pandas DataFrames for its tabular format. Return an array if you need a single prediction
Setting output_action to SUMMARY_ONLY will do just that, otherwise setting to APPEND_ROW will append to the indicated output file
Do not output complex data types beyond pandas DataFrame. Anything else will be transformed to a hard-to-read string.

Path6.Mod2.b - Deploy and Consume Models - Batch Endpoint Deployment w/out MLFlow Flashcards

(10 cards)