Path2.Mod1.c - Make Data Available - Creating Data Assets Flashcards

1
Q

L BS DLG2 Ds

Creating a Data Asset: URI File supported paths

A
  • Local: ./<path to file>
  • Blob Storage: wasbs://<account>.blob.core.windows.net/<container>/<folder>/<file>
  • Data Lake Gen 2 storage: abfss://<file_system>@<account>.dfs.core.windows.net/<folder>/<file>
  • Datastore: azureml://datastores/<name>/paths/<folder>/<file>
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Behavior when creating a Local Data Asset

A

A copy of the Local Data Asset is uploaded to the default datastore workspaceblobstore in the LocalUpload folder, making it available even when the local device is unavailable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The context for using an MLTable Data Asset

A

When the schema of your data is complex or frequently changes.

For MLTable Data Assets, you specify the schema definition for reading the data. So instead of changing how to read the data for each script that uses it, you only change the schema stored in the Data Asset itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

(T/F):
- Certain Azure ML features like Automated ML require an MLTable Data Asset to understand how to read its data
- MLTable Schemas are stored in an Azure Blob, then pulled in by your job via parameter input

A
  • True
  • False. You store the MLTable file in the same folder as the data you’re reading.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe what this code is doing:

from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

my_data = Data(
    path='<supported-path>',
    type=AssetTypes.URI_FILE,
    description="<description>",
    name="<name>",
    version="<version>"
)

ml_client.data.create_or_update(my_data)
A

Creates a URI_FILE Data Asset (the type parameter). Uses <supported-path> to represent a local device path

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe three things that this code is doing and give an alternative for when the input is in JSON:

import argparse
import pandas as pd

parser = argparse.ArgumentParser()
parser.add_argument("--input_data", type=str)
args = parser.parse_args()

df = pd.read_csv(args.input_data)
print(df.head(10))
A
  • Uses argparse to create an input parameter called “–input_data”
  • When starting up your job, set --input_data to your URI FILE data asset
  • Assuming a .csv file, it is then read into memory via pd.read_csv
  • if your data is in json, use pd.read_json()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe what this code is doing:

from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

my_data = Data(
    path='<supported-path>',
    type=AssetTypes.URI_FOLDER,
    description="<description>",
    name="<name>",
    version='<version>'
)

ml_client.data.create_or_update(my_data)
A
  • Creates a URI_FOLDER Data Asset (the type parameter)
  • “supported-path” is some local device path
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe what this code is doing:

import argparse
import glob
import pandas as pd

parser = argparse.ArgumentParser()
parser.add_argument("--input_data", type=str)
args = parser.parse_args()

data_path = args.input_data
all_files = glob.glob(data_path + "/*.csv")
df = pd.concat((pd.read_csv(f) for f in all_files), sort=False)
A
  • Uses argparse to create an input parameter --input_data
  • When starting up your job, set --input_data to your URI FOLDER data asset
  • glob all the csv files together with their target path to create a collection of them
  • Iterate through the “glob” to create a Pandas Dataframe
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe what this code is doing:

type: mltable

paths:
  - pattern: ./*.txt
transformations:
  - read_delimited:
      delimiter: ','
      encoding: ascii
      header: all_files_same_headers
A

CLI YAML for creating an MLTable; For all the .txt files in the current folder, read them as comma separated files encoded in ascii

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe what this code is doing:

from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

my_data = Data(
    path= '<path-including-mltable-file>',
    type=AssetTypes.MLTABLE,
    description="<description>",
    name="<name>",
    version='<version>'
)

ml_client.data.create_or_update(my_data)
A
  • Creates an MLTABLE Data Asset (the type parameter)
  • “path-including-mltable-file” is some local device path
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe what this code is doing:

import argparse
import mltable
import pandas

parser = argparse.ArgumentParser()
parser.add_argument("--input_data", type=str)
args = parser.parse_args()

tbl = mltable.load(args.input_data)
df = tbl.to_pandas_dataframe()

print(df.head(10))
A
  • Uses argparse to create an input parameter called “–input_data”
  • When starting up your job, set –input_data to your MLTable data asset
  • Loads the data through mltable.load then converts it to a Pandas DataFrame (a common conversion approach).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly