Path5.Mod1.b - Run Pipelines - Creating an Execute Python Script Component Flashcards
Steps to implement the Execute Python Script Component in ML Designer
- The first input point
- The entry point method name, signature and return value to implement know this for the exam!!!
-
Connect Input Datasets: Connect the output of an applicable Component into the appropriate input of the Execute Python Script Component. Typically the first (top-left) input
dataset1
-
Write your Python Code: Open the Component’s Python editor and add your code. You must implement
azureml_main
and it must returna Panda's Dataframe
:
import pandas as pd def azureml_main(dataframe1 = None, dataframe2 = None): # Do something with the two input data frames return my_dataframe_result
d1 d2 sb rd1 rd2
The Execute Python Script Component Ports
- Dataset1
- Dataset2
- Script Bundle (zip)
- Result Dataset1
- Result Dataset2
The Execute Python Script DataSet1 and DataSet2 ports:
- Which parameters do they correspond to
- Which are optional
- Behavior when Datasets are inputted
-
DataFrame1
andDataFrame2
respectively. - Both are optional
- Both are automatically converted to a Pandas Dataframe
The Execute Python Script Script Bundle port
- How to provide input to the port
- The error you’re avoiding when compressing code files larger than 16KB
Create and connect a File-DataSet of a zip file containing new Python packages, or code files when your Python code is larger than 16 KB
When code files are larger than 16KB, you need to zip them to avoid errors like CommandLine exceeds the limit of 16597 characters
The Execute Python Script Result dataset1 and Result dataset2 ports
- What they both return
- What to do with additional output datasets
- Both return a Pandas DataFrame.
- Anymore returned DataFrames need to be written to Azure storage
How to consume the Execute Python Script Component’s Script Bundle
input
The zip file connected to the port is unzipped to a folder ./Script Bundle., which is then added to sys.path
, allowing you to directly import modules from that location:
import pandas as pd from my_script import my_func def azureml_main(dataframe1 = None, dataframe2 = None): # Execution logic goes here print(f'Input pandas.DataFrame #1: {dataframe1}') # Test the custom defined Python function dataframe1 = my_func(dataframe1) # Test to read custom uploaded files by relative path with open('./Script Bundle/my_sample.txt', 'r') as text_file: sample = text_file.read() return dataframe1, pd.DataFrame(columns=["Sample"], data=[[sample]])
The Excute Python Script Component doesn’t support packages that depend on extra native libraries with commands like “apt-get”, such as Java, PyODBC, etc. This is because components are executed in a simple environment with only Python pre-installed and with non-admin permission (T/F)
TRUE. Key being Python only and non-Admin Priviledges
It is recommended to connect to external data sources like a SQL DB or Mongo DB from the Excute Python Script Component when data sets exist externally or in 3rd party systems (T/F)
FALSE. It is NOT recommended to connect to any external storages. Use Import Data or Export Data Components for that.