Path5.Mod1.a - Run Pipelines - Creating a Component Flashcards
build share scale
Components
- What they are
- Three reasons for using them
- How to make them accessible to other users in the Workspace
- Reusable, self-contained scripts that are easily shared with all users in an ML Workspace. Ideally you design a component to perform one step or a specific action relevant to your ML workflow.
- Reasons:
- To build a pipeline (dur!)
- To share reusable code
- When you’re preparing your code for scale
- To make them accessible in the Workspace you need to register your Components to the Workspace.
A pipeline is a workflow of ML tasks, related to training an ML Model, with each step or task being a Component. In other words, a pipeline is a workflow made up of Components.
Met Int CCE
Three parts to a Component
- Metadata: ex. the Component name, version, etc.
- Interface: Expected input parameters (ex. dataset, hyperparameters) and expected output (ex. metrics, artifacts)
- Code/Command/Environment: Specifies where your code is and how to run it
Two files required to create a Component, and two ways to create the latter…
- A script containing the workflow you want to execute
- A YAML file to define the Three Parts of your Component
The YAML file can be created manually or using command_component()
as a decorator to create the file.
Required Libraries for creating a Component
Know this for the exam!!!
azure.identity for credentialing
azure.ai.ml obviously
azure.ai.ml.dsl for the pipeline
attribute
from azure.identity import DefaultAzureCredential, from azure.identity import InteractiveBrowserCredential from azure.ai.ml import MLClient from azure.ai.ml.dsl import pipeline from azure.ai.ml import load_component
Given the following code, determine where The Three Parts of a Component are defined:
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json name: prep_data display_name: Prepare training data version: 1 type: command inputs: input_data: type: uri_file outputs: output_data: type: uri_file code: ./src environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest command: >- python prep.py --input_data ${{inputs.input_data}} --output_data ${{outputs.output_data}}
Metadata:
- Component name “prep_data”
- Version 1
- Display name “Prepare training data”
- Type “command”
Interface:
- inputs.input_data.type uri_file
- outputs.output_data.type uri_file
Command, Code, Environment
- code (the location) “./src”
- environment, similar to a Docker’s base image
- command, what to execute when the Component is used:
~~~
>-
python prep
–input_data ${{inputs.input_data}}
–output_data ${{outputs.output_data}}
~~~
unusual path, usual path
Code samples for Loading (and it’s parameter) and Registering a Component
Given the YML file defined earlier, to load it:
from azure.ai.ml import load_component parent_dir = "./src" loaded_comp = load_component(source=parent_dir + "/prep.yml")
To register the Component:prep = ml_client.components.create_or_update(loaded_comp)
For referencing web-based data, use this entity. Also the significance of using it
from azure.ai.ml import Input
Why use Input
though? Because it creates a reference to the data source location, meaning the data remains in its existing location, and we incur no extra storage cost.
What the command_component()
annotation does in the example code below, and what the method’s signature translates to in ML Designer:
import os from pathlib import Path from mldesigner import command_component, Input, Output @command_component( name="prep_data", version="1", display_name="Prep Data", description="Convert data to CSV file, and split to training and test data", environment=dict( conda_file=Path(\_\_file\_\_).parent / "conda.yaml", image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04", ), ) def prepare_data_component( input_data: Input(type="uri_folder"), training_data: Output(type="uri_folder"), test_data: Output(type="uri_folder"), ): // Implementation here
Let’s you define a Component’s interface, metadata and code from a Python function. The code will be transformed into a single static specification (YAML) that a pipeline can process.
Note in the image below what the Component will look like in ML Designer; the display name, the inputs/output points, etc.