Renewal1 - Design an ML Training Solution Flashcards by Manuel Tupas

Use this service when one of the customizable prebuilt models suits your requirements, to save time and effort

Azure AI Services

How well did you know this?

Not at all

Perfectly

Use either of these services if you want to keep all data-related (data engineering and data science) projects within the same service.

Azure Synapse Analytics or Azure Databricks

How well did you know this?

Not at all

Perfectly

You’ll need to work with PySpark to use the distributed compute…

Use either of these services if you need distributed compute for working with large datasets (datasets are large when you experience capacity constraints with standard compute).

Azure Synapse Analytics or Azure Databricks

How well did you know this?

Not at all

Perfectly

Use either of these services when you want full control over model training and management

Azure Machine Learning or Azure Databricks

How well did you know this?

Not at all

Perfectly

The one you like…

Use this service when Python is your preferred programming language, OR when you want an intuitive user interface to manage your machine learning lifecycle

Azure Machine Learning

How well did you know this?

Not at all

Perfectly

Use this Compute option
with smaller datasets and cost efficiency

CPUs

How well did you know this?

Not at all

Perfectly

Nvidia…

Use this Compute option with large tabular data, or unstructured data (images, text), where processing may take a lot of time

GPUs

How well did you know this?

Not at all

Perfectly

Use this Compute option when using Azure Synapse Analytics or Azure Databricks and you want to distribute your workloads

Spark

How well did you know this?

Not at all

Perfectly

Use this VM type when you want a balanced CPU-to-memory ratio (Ideal for testing and development with smaller datasets)

General Purpose

How well did you know this?

Not at all

Perfectly

Use this VM type when you want a high memory-to-CPU ratio (Great for in-memory analytics, which is ideal when you have larger datasets or when you’re working in notebooks)

Memory Optimized

How well did you know this?

Not at all

Perfectly

nodes…

Spark requires Spark-Friendly languages to make optimal use of Spark’s distributed workload capabilities (Scala, SQL, RSpark or PySpark). If you just use Python, what happens?

Spark distributes through its worker nodes and communicates between them and you with its driver node. Using Python restricts you to just using the driver node…

How well did you know this?

Not at all

Perfectly

Monitoring your training time and compute utilization lets you decide on these things

compute utilization lets you know to scale up or down
long training times indicate a need for GPUs over CPUs, or going distributed

Going distributed make force a rewrite to Spark-Friendly languages

How well did you know this?

Not at all

Perfectly