Path1.Mod1.e - Explore ML Workspace - Azure ML Resources and Assets Flashcards
W CR DSt
The Three ML Resources
The Workspace, Compute Resources, Datastores
CI CC IC AC (you know this from AI-900)
The Four Compute Resources.
Despite the cost-intensiveness, you should always let Data Scientists manage/edit/create Compute Resources (T/F)?
- Compute Instance - VMs
- Compute Clusters- On-demand CPU/GPU
- Inference Clusters - Allows you to create AKS clusters (or attach an existing one)
- Attached Compute - Attach other compute resources like Databricks or Synapse Spark pools
FALSE - Best practice is to only allow Admins to manage, Data Scientists to use (read-only) available ones.
The two auto-created Datastores for the Azure Storage account created when the workspace was created and the kind of data stored there.
The third kind of datastore commonly connected to by Data Science projects.
-
workspacefilestore
- connects to File Share. Stores Jupyter Notebooks and Python scripts -
workspaceblobstore
- connects to Blob Storage. Stores metrics and output when tracking model training (Default)
Azure Data Lake Storage (Gen2)
M E D C
The Four ML Assets
Models, Environments, Data, Components
n v
Must specify these two things when creating a Model in your Workspace.
Must specify these two things when creating an Environment.
Must specify these three things when creating Data Assets.
Must specify these four things when creating Components
Name and Version
For Data Assets, the path to the file or folder (asset) as well as the above.
For Components, code and Environment needed to run the code as well as the above
An Environment:
- it’s purpose
- how it’s stored
- how to use it for script execution
- To prep a Compute Target for running your script. Specify software packages, Env. vars and software settings
- Stored as an image in the same ACR as the workspace
- When you want to run a script, specify the Environment to be used by the Compute Target. Will install all reqs on the Compute before executing the script.
Data Assets:
- their purpose
- how to use them
- how they differ from Datastores.
- They contain auth/access info to a specific file or folder (they refer to them)
- They can be used to access data without having to authenticate on each access
- Datastores contain connection info to Azure Data Storage services and provide management functionality for raw data. Datasets provide easy access to samples of that data during ML tasks.
“Good Design means…”
The purpose of Components and how to use them.
Code resuse! Stored code snippets you use when creating Pipelines, representing a step in the Pipeline.
Basically the Components you drag and drop in the Designer…and remember you can write your own…