Datastores Flashcards
What are the two built-in datastores in Azure
Azure Storage blob container and an Azure storage file container
What are the types of dataset you can create
Tabular: The data is read from the dataset as a table. You should use this type of dataset when your data is consistently structured and you want to work with it in common tabular data structures, such as Pandas dataframes
File: The dataset presents a list of file paths that can be read as though from the file system. Use this type of dataset when your data is unstructured, or when you need to process the data at the file level
What can be done to enable historical tracking of datasets when used in experiments?
Datasets can be versioned. You can create a new version by registering it with the same name as previously registered dataset and specifying the create_new_version property
You’ve uploaded some data files to a folder in a blob container, and registered the blob container as a datastore in your Azure Machine Learning workspace. You want to run a script as an experiment that loads the data files and trains a model. What should you do?
Create a data reference for the datastore location and pass it to the script as a parameter.
To access a path in a datastore in an experiment script, you must create a data reference and pass it to the script as a parameter. The script can then read data from the data reference parameter just like a local file path.
You’ve registered a dataset in your workspace. You want to use the dataset in an experiment script that is run using an estimator. What should you do?
Pass the dataset as a named input to the estimator.
To access a dataset in an experiment script, pass the dataset as a named input to the estimator.