Manage Azure Resources for ML Flashcards
Azure ML Architecture Overview
(5 Areas)
Workspace
Everything is managed from the workspace. Central repositroy for training, runs, logs, metrics, outputs, etc.
Managed Resources
- Compute Instance*: cloud based workstation for ML.
- Compute Clusters*: cluster of computer nodes for production grade ML.
Linked Services
- Datastores*: houses data for experiments.
- Compute Targets*: machine where develop, train, and run experiments. Local, virtual, or compute instance machines.
Dependencies
- Storage Account*: storage from workspace sdmin, stores information about the resources.
- Container Registry*: deploy to production or training on docker instances.
- Key Vault*: key and security storage.
- Application Insight*: logs information for models being run (e.g., request, failures, exceptions, load, performance, logs)
Assets
- Everything else not listed above
- environment
- experiments
- pipelines
- datasets
- models
- endpoints
Steps to Setting Up a Workspace
(3 high level steps)
(what are the mandatory fields)
(what are the dependencies/reasources you have to attach to the workspace)
- Create Button
- Search Machine Learning
- Create Button
Mandatory Fields
- Subscription
- Resource Group
- Workspace Name
- Region
- Storage Account (resource)
- Key Vault (resource)
- Application Insights (resource)
- Container Registry (resource)
Create Compute Steps
3 general steps
common elements needed for compute instances and compute cluster
(come back to - inference and attached latter)
- Launch Studio
- Left Pane: Manage > Compute
- Pannel Select Compute Type
Compute Instance
- name
- location (same as region)
- VM type: CPU/GPU
- VM Size: cost and processing speed
Compute Cluster
- location (same as region)
- priority: dedicated/low priority
- VM type: CPU/GPU
- VM size: cost and processing speed
Inference
- location (same as region)
- VM size: cost and processing speed
- name
- purpose: production/dev-test
- number of nodes
- network configuration
Attached
- select type:
- databricks
- data lake analytics
- HDInsights
- Synapse Spark pool (preview)
- Kubernetes pool (preview)
- VM
Manage Data in Workspace:
Define Datastore
a connection to the dataset file system
allows the data to be available in the workspace
Default Datastore
(definition & where is it found)
default datastore the location were the workspace stores all of the meta data
can be found in the default storage account
Move Datasets to Datastore
(single v. multiple files
what is essential for multiple files)
Single File Upload
- container link
- upload button
Multiple Files - Directory
- container
- storage explorer
- create folder
- uplaod
- ML Studio
- datasets
- create datasets
- select datastore
- directory path
- preview data
- schema
- confirm
Manage Compute for Experiments
( 3 main computes )
( 2 sub-computes )
Compute Instance
- used for development; training & infrencing
- single VM acting as workstation
- notebook only
Compute Cluster
- used for development; training pipelines
- group of VMs
- notebook, AutoML, Designer
Compute Target
- used for deployment; instances & clusters become targets
-
Remote/Attached Compute
- test or batch services
- local machines, instances, clusters, VMs
- train/test deployment
-
Inference Cluster**
- real-time analytics
- AKS or Kubernetes
- auto-scaling, highly scalabe
When to Use the Different Types of Computes?
Compute Instance: notebooks
Compute Clusters: autoML, notebooks, designer
Compute Target Remote/Attached Compute: test/batch deployment
Compute Target Inference: real-time deployment
Security & Access Controls:
Default Roles
( 4 )
- Reader: read-only
- Contributor: view, create, edit, delete within workspace.
- Owner: full access, compute, instances, linked resources, assets, dependencies.
-
Custom: custom roles but cannot:
- create, delete, update compute
- add, delete, alter roles
- delete workspace
Security & Access Controls:
Create w/ .json File
( 6 elements )
{
“Name”: “Data Scientist Custom”,
“IsCustom”: true,
“Description”: “role description”,
“Actions”: [”*”],
“NotActions”: [“list of actions cannot perform”],
“AssignableScopes”: [“scope of role assignment: workspace, resource group, resource”]
}
Key Vault Credentials Management
definition & commands
(create, add secret, retrieve secret)
key vault definition: stores keys, passwords, certificates, credentials
create key vault
az keyvault create –name –resource-group –location
add secret
az keyvault secret set –vault-name –name “ExamplePassword” –value “pass_value”
retrieve secret
az keyvault secret show
Azure Command for Creating Custom Roles
(deploy, assign, update, list)
deploy
az role definition create –role-definition
assign
az ml workspace share -w (for workspace)
update
az role definition update –role-definition
list
az role definition list –subscription
SDK: Create Workspace
what is the function name?
what are the minimum input variables for the function?
what is a critical step?
function to create a wrokspace
Workspace.create( )
minimum required fields
What are the minimum required fields to make a workspace?
- workspace name: name = ‘’
- resource group: resource_group=’’
- subscription: subscription_id = ‘’
- region: location=’’
Critical Step
ws.write_config(path=”./config”)
Example Below
create workspace
ws = Workspace.create(
name=’’,
subscription_id=’’,
resource_group=’
create_resource_group=True, # True if not exist
location=’’
)
SDK: Create Datastore
3 important datastore methods?
what step must you always do, what’s the method name?
Datastore Methods
- .get(): get data store by name
- .get_default(): get default datastore
- .register_azure_data_storage_type(): registers a datastoreage type based on the name provided
Important Step
Access the workspace from the config.json
ws = Workspace.from_config(path=”./config”)
Example
az_store = Datastore.register_azure_blob_container(
workspace=ws,
datastore_name=”azure_sdk_blob01”,
account_name=”azuremlstb01”,
container_name=”azuremlstb01blob”,
account_key=”account_key” #access storage account)
sas token can also be used
SDK: Create & Register Dataset
methods to get datasets?
method to register dataset?
what data structure must the path to files be in?
medthods to get datasets
Dataset.Tabular.from_delimited_files()
Dataset.File.from_files()
from_…. can also be used to pull data from queries, parquets, etc.
register a dataset
Dataset.register()
Create the path of the csv file
# must come in a tuple format
csv_path = [(az_store, “path_to_file”)]