Path2.Mod1.c - Make Data Available - Creating Data Assets Flashcards

Question 1

Q

L BS DLG2 Ds

Creating a Data Asset: URI File supported paths

Answer

A

Local: ./<path to file>
Blob Storage: wasbs://<account>.blob.core.windows.net/<container>/<folder>/<file>
Data Lake Gen 2 storage: abfss://<file_system>@<account>.dfs.core.windows.net/<folder>/<file>
Datastore: azureml://datastores/<name>/paths/<folder>/<file>

Question 2

Q

Behavior when creating a Local Data Asset

Answer

A

A copy of the Local Data Asset is uploaded to the default datastore workspaceblobstore in the LocalUpload folder, making it available even when the local device is unavailable

Question 3

Q

The context for using an MLTable Data Asset

Answer

A

When the schema of your data is complex or frequently changes.

For MLTable Data Assets, you specify the schema definition for reading the data. So instead of changing how to read the data for each script that uses it, you only change the schema stored in the Data Asset itself.

Question 4

Q

(T/F):
- Certain Azure ML features like Automated ML require an MLTable Data Asset to understand how to read its data
- MLTable Schemas are stored in an Azure Blob, then pulled in by your job via parameter input

Answer

A

True
False. You store the MLTable file in the same folder as the data you’re reading.

Question 5

Q

Describe what this code is doing:

from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

my_data = Data(
    path='<supported-path>',
    type=AssetTypes.URI_FILE,
    description="<description>",
    name="<name>",
    version="<version>"
)

ml_client.data.create_or_update(my_data)

Answer

A

Creates a URI_FILE Data Asset (the type parameter). Uses <supported-path> to represent a local device path

Question 6

Q

Describe three things that this code is doing and give an alternative for when the input is in JSON:

import argparse
import pandas as pd

parser = argparse.ArgumentParser()
parser.add_argument("--input_data", type=str)
args = parser.parse_args()

df = pd.read_csv(args.input_data)
print(df.head(10))

Answer

A

Uses argparse to create an input parameter called “–input_data”
When starting up your job, set --input_data to your URI FILE data asset
Assuming a .csv file, it is then read into memory via pd.read_csv
if your data is in json, use pd.read_json()

Question 7

Q

Describe what this code is doing:

from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

my_data = Data(
    path='<supported-path>',
    type=AssetTypes.URI_FOLDER,
    description="<description>",
    name="<name>",
    version='<version>'
)

ml_client.data.create_or_update(my_data)

Answer

A

Creates a URI_FOLDER Data Asset (the type parameter)
“supported-path” is some local device path

Question 8

Q

Describe what this code is doing:

import argparse
import glob
import pandas as pd

parser = argparse.ArgumentParser()
parser.add_argument("--input_data", type=str)
args = parser.parse_args()

data_path = args.input_data
all_files = glob.glob(data_path + "/*.csv")
df = pd.concat((pd.read_csv(f) for f in all_files), sort=False)

Answer

A

Uses argparse to create an input parameter --input_data
When starting up your job, set --input_data to your URI FOLDER data asset
glob all the csv files together with their target path to create a collection of them
Iterate through the “glob” to create a Pandas Dataframe

Question 9

Q

Describe what this code is doing:

type: mltable

paths:
  - pattern: ./*.txt
transformations:
  - read_delimited:
      delimiter: ','
      encoding: ascii
      header: all_files_same_headers

Answer

A

CLI YAML for creating an MLTable; For all the .txt files in the current folder, read them as comma separated files encoded in ascii

Question 10

Q

Describe what this code is doing:

from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

my_data = Data(
    path= '<path-including-mltable-file>',
    type=AssetTypes.MLTABLE,
    description="<description>",
    name="<name>",
    version='<version>'
)

ml_client.data.create_or_update(my_data)

Answer

A

Creates an MLTABLE Data Asset (the type parameter)
“path-including-mltable-file” is some local device path

Question 11

Q

Describe what this code is doing:

import argparse
import mltable
import pandas

parser = argparse.ArgumentParser()
parser.add_argument("--input_data", type=str)
args = parser.parse_args()

tbl = mltable.load(args.input_data)
df = tbl.to_pandas_dataframe()

print(df.head(10))

Answer

A

Uses argparse to create an input parameter called “–input_data”
When starting up your job, set –input_data to your MLTable data asset
Loads the data through mltable.load then converts it to a Pandas DataFrame (a common conversion approach).

Path2.Mod1.c - Make Data Available - Creating Data Assets Flashcards

(11 cards)