Path5.Mod1.a - Run Pipelines - Creating a Component Flashcards

Question 1

Q

build share scale

Components
- What they are
- Three reasons for using them
- How to make them accessible to other users in the Workspace

Answer

A

Reusable, self-contained scripts that are easily shared with all users in an ML Workspace. Ideally you design a component to perform one step or a specific action relevant to your ML workflow.
Reasons:
1. To build a pipeline (dur!)
2. To share reusable code
3. When you’re preparing your code for scale
To make them accessible in the Workspace you need to register your Components to the Workspace.

A pipeline is a workflow of ML tasks, related to training an ML Model, with each step or task being a Component. In other words, a pipeline is a workflow made up of Components.

Question 2

Q

Met Int CCE

Three parts to a Component

Answer

A

Metadata: ex. the Component name, version, etc.
Interface: Expected input parameters (ex. dataset, hyperparameters) and expected output (ex. metrics, artifacts)
Code/Command/Environment: Specifies where your code is and how to run it

Question 3

Q

Two files required to create a Component, and two ways to create the latter…

Answer

A

A script containing the workflow you want to execute
A YAML file to define the Three Parts of your Component

The YAML file can be created manually or using command_component() as a decorator to create the file.

Question 4

Q

Required Libraries for creating a Component
Know this for the exam!!!

Answer

A

azure.identity for credentialing
azure.ai.ml obviously
azure.ai.ml.dsl for the pipeline attribute

from azure.identity import DefaultAzureCredential,
from azure.identity import InteractiveBrowserCredential

from azure.ai.ml import MLClient
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import load_component

Question 5

Q

Given the following code, determine where The Three Parts of a Component are defined:

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: prep_data
display_name: Prepare training data
version: 1
type: command
inputs:
  input_data: 
    type: uri_file
outputs:
  output_data:
    type: uri_file
code: ./src
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
command: >-
  python prep.py 
  --input_data ${{inputs.input_data}}
  --output_data ${{outputs.output_data}}

Answer

A

Metadata:
- Component name “prep_data”
- Version 1
- Display name “Prepare training data”
- Type “command”

Interface:
- inputs.input_data.type uri_file
- outputs.output_data.type uri_file

Command, Code, Environment
- code (the location) “./src”
- environment, similar to a Docker’s base image
- command, what to execute when the Component is used:

  >-     
  python prep
	--input_data ${{inputs.input_data}}
	--output_data ${{outputs.output_data}}

Question 6

Q

unusual path, usual path

Code samples for Loading (and it’s parameter) and Registering a Component

Answer

A

Given the YML file defined earlier, to load it:

from azure.ai.ml import load_component

parent_dir = "./src"
loaded_comp = load_component(source=parent_dir + "/prep.yml")

To register the Component:
prep = ml_client.components.create_or_update(loaded_comp)

Question 7

Q

For referencing web-based data, use this entity. Also the significance of using it

Answer

A

from azure.ai.ml import Input

Why use Input though? Because it creates a reference to the data source location, meaning the data remains in its existing location, and we incur no extra storage cost.

Question 8

Q

What the command_component() annotation does in the example code below, and what the method’s signature translates to in ML Designer:

import os
from pathlib import Path
from mldesigner import command_component, Input, Output

@command_component(
    name="prep_data",
    version="1",
    display_name="Prep Data",
    description="Convert data to CSV file, and split to training and test data",
    environment=dict(
        conda_file=Path(\_\_file\_\_).parent / "conda.yaml",
        image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
    ),
)
def prepare_data_component(
    input_data: Input(type="uri_folder"),
    training_data: Output(type="uri_folder"),
    test_data: Output(type="uri_folder"),
):
   // Implementation here

Answer

A

Let’s you define a Component’s interface, metadata and code from a Python function. The code will be transformed into a single static specification (YAML) that a pipeline can process.

Note in the image below what the Component will look like in ML Designer; the display name, the inputs/output points, etc.

Path5.Mod1.a - Run Pipelines - Creating a Component Flashcards

(8 cards)