Set up an Azure Machine Learning Workspace Flashcards

Question

Wat is the code pattern to submit a training run? | Is it the same for all types of compute targets?

Answer 1

Create an experiment to run Create an environment where the script will run Create a ScriptRunConfig, which specifies the compute target and environment Submit the run Wait for the run to complete Yes, it is the same for all.

Answer 2

``` # create experiment from azureml.core import Experiment ``` ``` experiment_name = 'my_experiment' experiment = Experiment(workspace=ws, name=experiment_name) ``` ``` # select a compute target #if If no compute target is specified in the ScriptRunConfig this will default be local. compute_target='local' ``` ``` #create an environment from azureml.core import Workspace, Environment ``` ``` ws = Workspace.from_config() myenv = Environment.get(workspace=ws, name="AzureML-Minimal") ``` from azureml.core import Environment ``` myenv = Environment("user-managed-env") myenv.python.user_managed_dependencies = True ``` ``` # You can choose a specific Python environment by pointing to a Python path # myenv.python.interpreter_path = '/home/johndoe/miniconda3/envs/myenv/bin/python ``` ``` # creatge the scritp run configuration from azureml.core import ScriptRunConfig ``` src = ScriptRunConfig(source_directory=project_folder, script='train.py', compute_target=my_compute_target, environment=myenv) ``` # Set compute target # Skip this if you are running on your local computer script_run_config.run_config.target = my_compute_target ``` ``` #submit the experiment run = experiment.submit(config=src) run.wait_for_completion(show_output=True) ```

Answer 3

Azure Machine Learning environments are an encapsulation of the environment where your machine learning training happens. They specify the Python packages, Docker image, environment variables, and software settings around your training and scoring scripts. They also specify runtimes (Python, Spark, or Docker). You can either define your own environment, or use an Azure ML curated environment. Curated environments are predefined environments that are available in your workspace by default. These environments are backed by cached Docker images which reduce the run preparation cost. See Azure Machine Learning Curated Environments for the full list of available curated environments.

Answer 4

BIOS-enabled virtualization | Windows 10 64-bit Professional

Answer 5

Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop.

Answer 6

Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform. ... For a big data pipeline, the data (raw or structured) is ingested into Azure through Azure Data Factory in batches, or streamed near real-time using Kafka, Event Hub, or IoT Hub.

Answer 7

Apache Spark is a data processing framework that can quickly perform processing tasks on very large data sets, and can also distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools.

Answer 8

It is the cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale. Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores

Answer 9

This is good for development or testing, not for production workloads! Use Azure Container Instances for data processing where source data is ingested, processed, and placed in a durable store such as Azure Blob storage. By processing the data with ACI rather than statically-provisioned virtual machines, you can achieve significant cost savings through per-second billing.

Answer 10

MMLSpark provides a number of deep learning and data science tools for Apache Spark , including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK) and OpenCV , enabling you to quickly create powerful, highly-scalable predictive and analytical models for large image and text datasets. MMLSpark requires Scala 2.11, Spark 2.1+, and either Python 2.7 or Python 3.5+. See the API documentation for Scala and for PySpark . Salient features Easily ingest images from HDFS into Spark DataFrame ( example:301 ) Pre-process image data using transforms from OpenCV ( example:302 ) Featurize images using pre-trained deep neural nets using CNTK ( example:301 ) Train DNN-based image classification models on N-Series GPU VMs on Azure Featurize free-form text data using convenient APIs on top of primitives in SparkML via a single transformer ( example:201 ) Train classification and regression models easily via implicit featurization of data ( example:101 ) Compute a rich set of evaluation metrics including per-instance metrics ( example:102 )

Answer 11

Azure HDInsight is a cloud distribution of Hadoop components. Azure HDInsight makes it easy, fast, and cost-effective to process massive amounts of data. You can use the most popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, R, and more.

Answer 12

Azure Data Lake Analytics is an on-demand analytics job service that simplifies big data. Easily develop and run massively parallel data transformation and processing programmes in U-SQL, R, Python and . ... With no infrastructure to manage, you can process data on demand, scale instantly and only pay per job.

Answer 13

AZure storage explorere AZCopy Python

Answer 14

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.

Set up an Azure Machine Learning Workspace Flashcards

(38 cards)