Path5.Mod1.b - Run Pipelines - Creating an Execute Python Script Component Flashcards

1
Q

Steps to implement the Execute Python Script Component in ML Designer
- The first input point
- The entry point method name, signature and return value to implement know this for the exam!!!

A
  1. Connect Input Datasets: Connect the output of an applicable Component into the appropriate input of the Execute Python Script Component. Typically the first (top-left) input dataset1
  2. Write your Python Code: Open the Component’s Python editor and add your code. You must implement azureml_main and it must return a Panda's Dataframe:
import pandas as pd

def azureml_main(dataframe1 = None, dataframe2 = None):
    # Do something with the two input data frames
    return my_dataframe_result
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

d1 d2 sb rd1 rd2

The Execute Python Script Component Ports

A
  • Dataset1
  • Dataset2
  • Script Bundle (zip)
  • Result Dataset1
  • Result Dataset2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The Execute Python Script DataSet1 and DataSet2 ports:
- Which parameters do they correspond to
- Which are optional
- Behavior when Datasets are inputted

A
  • DataFrame1 and DataFrame2 respectively.
  • Both are optional
  • Both are automatically converted to a Pandas Dataframe
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The Execute Python Script Script Bundle port
- How to provide input to the port
- The error you’re avoiding when compressing code files larger than 16KB

A

Create and connect a File-DataSet of a zip file containing new Python packages, or code files when your Python code is larger than 16 KB

When code files are larger than 16KB, you need to zip them to avoid errors like CommandLine exceeds the limit of 16597 characters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The Execute Python Script Result dataset1 and Result dataset2 ports
- What they both return
- What to do with additional output datasets

A
  • Both return a Pandas DataFrame.
  • Anymore returned DataFrames need to be written to Azure storage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to consume the Execute Python Script Component’s Script Bundle input

A

The zip file connected to the port is unzipped to a folder ./Script Bundle., which is then added to sys.path, allowing you to directly import modules from that location:

import pandas as pd
from my_script import my_func

def azureml_main(dataframe1 = None, dataframe2 = None):
    # Execution logic goes here
    print(f'Input pandas.DataFrame #1: {dataframe1}')

    # Test the custom defined Python function
    dataframe1 = my_func(dataframe1)

    # Test to read custom uploaded files by relative path
    with open('./Script Bundle/my_sample.txt', 'r') as text_file:
        sample = text_file.read()

    return dataframe1, pd.DataFrame(columns=["Sample"], data=[[sample]])
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The Excute Python Script Component doesn’t support packages that depend on extra native libraries with commands like “apt-get”, such as Java, PyODBC, etc. This is because components are executed in a simple environment with only Python pre-installed and with non-admin permission (T/F)

A

TRUE. Key being Python only and non-Admin Priviledges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

It is recommended to connect to external data sources like a SQL DB or Mongo DB from the Excute Python Script Component when data sets exist externally or in 3rd party systems (T/F)

A

FALSE. It is NOT recommended to connect to any external storages. Use Import Data or Export Data Components for that.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly