Manage Azure Resources for ML Flashcards

1
Q

Azure ML Architecture Overview

(5 Areas)

A

Workspace

Everything is managed from the workspace. Central repositroy for training, runs, logs, metrics, outputs, etc.

Managed Resources

  • Compute Instance*: cloud based workstation for ML.
  • Compute Clusters*: cluster of computer nodes for production grade ML.

Linked Services

  • Datastores*: houses data for experiments.
  • Compute Targets*: machine where develop, train, and run experiments. Local, virtual, or compute instance machines.

Dependencies

  • Storage Account*: storage from workspace sdmin, stores information about the resources.
  • Container Registry*: deploy to production or training on docker instances.
  • Key Vault*: key and security storage.
  • Application Insight*: logs information for models being run (e.g., request, failures, exceptions, load, performance, logs)

Assets

  • Everything else not listed above
  1. environment
  2. experiments
  3. pipelines
  4. datasets
  5. models
  6. endpoints
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Steps to Setting Up a Workspace

(3 high level steps)

(what are the mandatory fields)

(what are the dependencies/reasources you have to attach to the workspace)

A
  1. Create Button
  2. Search Machine Learning
  3. Create Button

Mandatory Fields

  1. Subscription
  2. Resource Group
  3. Workspace Name
  4. Region
  5. Storage Account (resource)
  6. Key Vault (resource)
  7. Application Insights (resource)
  8. Container Registry (resource)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Create Compute Steps

3 general steps

common elements needed for compute instances and compute cluster

(come back to - inference and attached latter)

A
  1. Launch Studio
  2. Left Pane: Manage > Compute
  3. Pannel Select Compute Type

Compute Instance

  • name
  • location (same as region)
  • VM type: CPU/GPU
  • VM Size: cost and processing speed

Compute Cluster

  • location (same as region)
  • priority: dedicated/low priority
  • VM type: CPU/GPU
  • VM size: cost and processing speed

Inference

  • location (same as region)
  • VM size: cost and processing speed
  • name
  • purpose: production/dev-test
  • number of nodes
  • network configuration

Attached

  • select type:
    • databricks
    • data lake analytics
    • HDInsights
    • Synapse Spark pool (preview)
    • Kubernetes pool (preview)
    • VM
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Manage Data in Workspace:

Define Datastore

A

a connection to the dataset file system

allows the data to be available in the workspace

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Default Datastore

(definition & where is it found)

A

default datastore the location were the workspace stores all of the meta data

can be found in the default storage account

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Move Datasets to Datastore

(single v. multiple files

what is essential for multiple files)

A

Single File Upload

  1. container link
  2. upload button

Multiple Files - Directory

  1. container
  2. storage explorer
  3. create folder
  4. uplaod
  5. ML Studio
    1. datasets
    2. create datasets
    3. select datastore
    4. directory path
      • preview data
      • schema
      • confirm
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Manage Compute for Experiments

( 3 main computes )

( 2 sub-computes )

A

Compute Instance

  • used for development; training & infrencing
  • single VM acting as workstation
  • notebook only

Compute Cluster

  • used for development; training pipelines
  • group of VMs
  • notebook, AutoML, Designer

Compute Target

  • used for deployment; instances & clusters become targets
  • Remote/Attached Compute
    • ​test or batch services
    • local machines, instances, clusters, VMs
    • train/test deployment
  • Inference Cluster**​
    • real-time analytics
    • AKS or Kubernetes
    • auto-scaling, highly scalabe
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When to Use the Different Types of Computes?

A

Compute Instance: notebooks

Compute Clusters: autoML, notebooks, designer

Compute Target Remote/Attached Compute: test/batch deployment

Compute Target Inference: real-time deployment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Security & Access Controls:

Default Roles

( 4 )

A
  1. Reader: read-only
  2. Contributor: view, create, edit, delete within workspace.
  3. Owner: full access, compute, instances, linked resources, assets, dependencies.
  4. Custom: custom roles but cannot:
    • create, delete, update compute
    • add, delete, alter roles
    • delete workspace

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Security & Access Controls:

Create w/ .json File

( 6 elements )

A

{

“Name”: “Data Scientist Custom”,

“IsCustom”: true,

“Description”: “role description”,

“Actions”: [”*”],

“NotActions”: [“list of actions cannot perform”],

“AssignableScopes”: [“scope of role assignment: workspace, resource group, resource”]

}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Key Vault Credentials Management

definition & commands

(create, add secret, retrieve secret)

A

key vault definition: stores keys, passwords, certificates, credentials

create key vault

az keyvault create –name –resource-group –location

add secret

az keyvault secret set –vault-name –name “ExamplePassword” –value “pass_value”

retrieve secret

az keyvault secret show

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Azure Command for Creating Custom Roles

(deploy, assign, update, list)

A

deploy

az role definition create –role-definition

assign

az ml workspace share -w (for workspace)

update

az role definition update –role-definition

list

az role definition list –subscription

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

SDK: Create Workspace

what is the function name?

what are the minimum input variables for the function?

what is a critical step?

A

function to create a wrokspace

Workspace.create( )

minimum required fields

What are the minimum required fields to make a workspace?

  1. workspace name: name = ‘’
  2. resource group: resource_group=’’
  3. subscription: subscription_id = ‘’
  4. region: location=’’

Critical Step

ws.write_config(path=”./config”)

Example Below

create workspace

ws = Workspace.create(

name=’’,

subscription_id=’’,

resource_group=’

create_resource_group=True, # True if not exist

location=’’

)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

SDK: Create Datastore

3 important datastore methods?

what step must you always do, what’s the method name?

A

Datastore Methods

  • .get(): get data store by name
  • .get_default(): get default datastore
  • .register_azure_data_storage_type(): registers a datastoreage type based on the name provided

Important Step

Access the workspace from the config.json

ws = Workspace.from_config(path=”./config”)

Example

az_store = Datastore.register_azure_blob_container(

workspace=ws,

datastore_name=”azure_sdk_blob01”,

account_name=”azuremlstb01”,

container_name=”azuremlstb01blob”,

account_key=”account_key” #access storage account)

sas token can also be used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

SDK: Create & Register Dataset

methods to get datasets?

method to register dataset?

what data structure must the path to files be in?

A

medthods to get datasets

Dataset.Tabular.from_delimited_files()

Dataset.File.from_files()

from_…. can also be used to pull data from queries, parquets, etc.

register a dataset

Dataset.register()

Create the path of the csv file

# must come in a tuple format

csv_path = [(az_store, “path_to_file”)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

SDK: Uplaod

methods

A

method for uploading files

.upload_file()

uploading folder or directory

.upload()

Default Data Store:

from azureml.core import Workspace
ws = Workspace.from_config()
datastore = ws.get_default_datastore()
datastore.upload(src_dir=’./data’,
target_path=’datasets/cifar10’,
overwrite=True)

17
Q

ProgrammingTools: What are they used for?

(SDK, Designer, CLI)

A

SDK: Python

Designer: low-code /no-code

CLI: automation