Path5.Mod1.c - Run Pipelines - Creating and Running a Pipeline Job Flashcards
Pipelines run as … while each Component runs as …
A pipeline job … a child job as part of the overall pipeline job
Child Jobs are the execution of individual Pipeline Components
- SDK 1 module where
pipeline
lives - SDK 2 module where
pipeline
lives - Module where
Workspace
lives - Module where
MLClient
lives
- SDK1 -
azureml.pipeline.core
- SDK2 -
azure.ai.ml.dsl
-
Workspace
lives inazureml.core
-
MLClient
lives inazure.ai.ml
Pipeline YAML files are created in two ways
Manually create the YAML file or use the @pipeline()
function to create it
(T/F)
- Components in a Pipeline are always sequential
- Components in a Pipeline can target a specific Compute resource
- To configure a pipeline job, you must pass in your configuration and settings values through the pipeline()
annotation.
- When you don’t specify the default_compute
value, your job will go into a pending
state until you set one.
- False. Can be Sequential or in Parallel
- True. Allows for different types of processing per task
- False. Once you have a pipeline job instance, you have Property access to the instance, like
output
,settings
, etc. - False. If you don’t specify one, it’ll simply use whatever your default compute actually is…
Explain in detail what this code is doing:
from azure.ai.ml.dsl import pipeline // 1. What's happening here? @pipeline() def my_pipeline_function(pipeline_job_input): prep_data = loaded_component_prep(input_data=pipeline_job_input) train_model = loaded_component_train(training_data=prep_data.outputs.output_data) return { "pipeline_job_transformed_data": prep_data.outputs.output_data, "pipeline_job_trained_model": train_model.outputs.model_output, } ... from azure.ai.ml import Input from azure.ai.ml.constants import AssetTypes // 2. What's happening here? pipeline_job = my_pipeline_function( Input(type=AssetTypes.URI_FILE, path="azureml:data:1") ) // 3. Why does this work? print(pipeline_job) // 4. What's happening here? pipeline_job_result = ml_client.jobs.create_or_update( pipeline_job, experiment_name="pipeline_job" )
- The top section of code creates a pipeline job by annotating a function with
@pipeline()
. This one preps data, trains the model and returns both the data and the model. It uses two loaded Components to accomplish this. - The bottom section of code calls the pipeline function to create a pipeline job. Uses an instance of
Input
to provide data. - Since the function call returns a formatted YAML file, you can print it out to see the results
- Finally, submit the pipeline job
Describe each parameter of the pipeline()
annotation and any default values:
pipeline( func=None, `*`, name: str | None = None, version: str | None = None, display _ name: str | None = None, description: str | None = None, experiment_name: str | None = None, tags: Dict[str, str] | None = None, `**` kwargs)
- func: the function to be annotated
- name: name of the pipeline component, defaults to the function name
- version: defaults to 1
- display_name: defaults to function name
- description: obvious
- experiment_name: default is the current directory. Otherwise the name of the experiment the job is created under
- tags: obvious
- kwargs: dictionary of additional config params
Where to view things:
- Both the pipeline job and its child jobs
- Config issues with the pipeline itself
- Config issues with a Component
submit job to workspace
- You can view the submitted job under Job Overview (open the job details to see a designer view of the jobs, then click Job overview button in the top right (see pic)
- Outputs and logs of the pipeline job
- Outputs and logs of the individual Child Jobs of the failed Component