Set up an Azure Machine Learning Workspace Flashcards
What is an AZ ML workspace?
he workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. The workspace keeps a history of all training runs, including logs, metrics, output, and a snapshot of your scripts. You use this information to determine which training run produces the best model.
What are the ways to create a workspaces?
- Use the Azure portal for a point-and-click interface to walk you through each step.
- Use the Azure Machine Learning SDK for Python to create a workspace on the fly from Python scripts or Jupiter notebooks
- Use an Azure Resource Manager template or the Azure Machine Learning CLI when you need to automate or customize the creation with corporate security standards.
- If you work in Visual Studio Code, use the VS Code extension.
How to create a workspace?
Create a workspace
To create a workspace, you need an Azure subscription. If you don’t have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning today.
Sign in to the Azure portal by using the credentials for your Azure subscription.
In the upper-left corner of Azure portal, select + Create a resource.
Use the search bar to find Machine Learning.
Select Machine Learning.
In the Machine Learning pane, select Create to begin.
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace
What are the ways to create a workspace?
- AZ portal
- AZ CLI
- AZ REST
- AZ Resource manager template
How to register a blob container?
use register_azure_blob_container()
blob_datastore_name=’azblobsdk’ # Name of the datastore to workspace
container_name=os.getenv(“BLOB_CONTAINER”, “”) # Name of Azure blob container
account_name=os.getenv(“BLOB_ACCOUNTNAME”, “”) # Storage account name
account_key=os.getenv(“BLOB_ACCOUNT_KEY”, “”) # Storage account access key
blob_datastore = Datastore.register_azure_blob_container(workspace=ws,
datastore_name=blob_datastore_name,
container_name=container_name,
account_name=account_name,
account_key=account_key)
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data#create-and-register-datastores
How to register an Azure file share?
file_datastore = Datastore.register_azure_file_share(workspace=ws,
datastore_name=file_datastore_name,
file_share_name=file_share_name,
account_name=account_name,
account_key=account_key)
How to get a specific datastore registered in the current workspace?
# Get a named datastore from the current workspace datastore = Datastore.get(ws, datastore_name='your datastore name')
How to get a list of all datastores with a given workspace?
# List all datastores registered in the current workspace datastores = ws.datastores for name, datastore in datastores.items(): print(name, datastore.datastore_type)
How to get the default datastore?
datastore = ws.get_default_datastore()
How to change the default datastore?
ws.set_default_datastore(new_default_datastore)
which methods allow you to access datastores during scoring?
Batch prediction
What is the recommended dataset type for machine learning workflows?
We recommend FileDatasets for your machine learning workflows, since the source files can be in any format, which enables a wider range of machine learning scenarios, including deep learning.
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets
What is the FileDataset type?
A FileDataset references single or multiple files in your datastores or public URLs. If your data is already cleansed, and ready to use in training experiments, you can download or mount the files to your compute as a FileDataset object.
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets
What is the TabularDataset type?
A TabularDataset represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a pandas or Spark DataFrame so you can work with familiar data preparation and training libraries without having to leave your notebook. You can create a TabularDataset object from .csv, .tsv, .parquet, .jsonl files, and from SQL query results.
How to create a TabularDataSet?
from azureml.core import Workspace, Datastore, Dataset
datastore_name = ‘your datastore name’
# get existing workspace workspace = Workspace.from_config()
# retrieve an existing datastore in the workspace by name datastore = Datastore.get(workspace, datastore_name)
create a TabularDataset from 3 file paths in datastore
datastore_paths = [(datastore, ‘weather/2018/11.csv’),
(datastore, ‘weather/2018/12.csv’),
(datastore, ‘weather/2019/*.csv’)]
weather_ds = Dataset.Tabular.from_delimited_files(path=datastore_paths)