Path6.Mod2.b - Deploy and Consume Models - Batch Endpoint Deployment w/out MLFlow Flashcards

1
Q

When deploying without MLFlow, all aspects are generally the same as if deploying with MLFlow, except for this

A

You need to generate the Scoring Script and an Execution Environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

i r

The three responsibilities of the Scoring Script when MLFlow isn’t being used

A
  • Load the Model
  • Reads new data
  • Performs scoring
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

i r

Two functions the Scoring Script must define and their respective use cases/responsibilities and return values

A
  • init(): Called once at the beginning of the script, use for costly or common prep like Model loading. Doesn’t return anything
  • run(): Responsible for reading new data. Called for each mini batch to perform scoring. Should return a pandas DataFrame or an arrayList
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Given this code, note:

  • AZUREML_MODEL_DIR
  • global model
  • mini_batch and why size matters!
  • where the predictions are written to
import os
import mlflow
import pandas as pd

// 1. What is this doing?
def init():
    global model
    model_path = os.path.join(os.environ["AZUREML_MODEL_DIR"], "model")

    // 2. What is this doing?
    model = mlflow.pyfunc.load(model_path)

def run(mini_batch):
    print(f"run method start: {\_\_file\_\_}, run({len(mini_batch)} files)")
    resultList = []

    // 3. What’s happening here?
    for file_path in mini_batch:
        data = pd.read_csv(file_path)
        pred = model.predict(data)
       
        // 4. What’s happening here?
        df = pd.DataFrame(pred, columns=["predictions"])
        df["file"] = os.path.basename(file_path)
        resultList.extend(df.values)

    return resultList
A
  1. A global model: making it global makes it available to all script code. Also, AZUREML_MODEL_DIR: env variable used to set to locate files associated with your Model.
  2. Load the MLFlow model
  3. And 4. mini_batch: the size is defined in deployment config. If files are too large to process, you need to split them into smaller ones. Load each file and predict. Predictions are all written to a single file via DataFrame.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

NOT manually…

Two ways to create the Execution Environment

A
  • Dockerfile
  • Docker Image w/ Conda Dependencies (basic conda.yml file):
name: basic-env-cpu
channels:
  - conda-forge
dependencies:
  - python=3.8
  - pandas
  - pip
  - pip:
      - azureml-core
      - mlflow
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Create an instance of the Environment class

A
from azure.ai.ml.entities import Environment

env = Environment(
    image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04",
    conda_file="./src/conda-env.yml",
    name="deployment-environment",
    description="Environment created from a Docker image plus Conda environment.",
)
ml_client.environments.create_or_update(env)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Create an instance of the BatchDeployment class (putting it all together)

A
from azure.ai.ml.entities import BatchDeployment, BatchRetrySettings
from azure.ai.ml.constants import BatchDeploymentOutputAction

deployment = BatchDeployment(
    name="forecast-mlflow",
    description="A sales forecaster",
    endpoint_name=endpoint.name,
    model=model, // The global model
    compute="aml-cluster",
    code_path="./code",
    scoring_script="score.py",
    environment=env,  // from earlier
    instance_count=2,
    max_concurrency_per_instance=2,
    mini_batch_size=2,
    output_action=
BatchDeploymentOutputAction.APPEND_ROW,
    output_file_name="predictions.csv",
retry_settings=BatchRetrySettings(max_retries=3, timeout=300),
    logging_level="info",
)
ml_client.batch_deployments.begin_create_or_update(deployment)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Using an AutomatedML Model with a Batch Endpoint

A

You can’t….The Scoring Script provided via AutomatedML only works with Online Endpoints; it’s not designed for Batch Endpoints.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How Batch Endpoints distribute work, how to handle larger files and what smaller batch files do for us

A

Batches are distributed in batches of files. So an input folder containing 100 files with 10 mini-batches will generate 10 batch child jobs of 10 files.

Again, larger files should be split up, or decrease the number of files per mini-batch to accomodate them.

Smaller files result in higher levels of parallelism.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

run() function return types
- Which to use for what type of return data
- How they are returned w.r.t. output_action
- What not to output and why

A
  • For multiple points of information, return a pandas DataFrames for its tabular format. Return an array if you need a single prediction
  • Setting output_action to SUMMARY_ONLY will do just that, otherwise setting to APPEND_ROW will append to the indicated output file
  • Do not output complex data types beyond pandas DataFrame. Anything else will be transformed to a hard-to-read string.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly