Renewal1 - Design an ML Training Solution Flashcards
Use this service when one of the customizable prebuilt models suits your requirements, to save time and effort
Azure AI Services
Use either of these services if you want to keep all data-related (data engineering and data science) projects within the same service.
Azure Synapse Analytics or Azure Databricks
You’ll need to work with PySpark to use the distributed compute…
Use either of these services if you need distributed compute for working with large datasets (datasets are large when you experience capacity constraints with standard compute).
Azure Synapse Analytics or Azure Databricks
Use either of these services when you want full control over model training and management
Azure Machine Learning or Azure Databricks
The one you like…
Use this service when Python is your preferred programming language, OR when you want an intuitive user interface to manage your machine learning lifecycle
Azure Machine Learning
Use this Compute option
with smaller datasets and cost efficiency
CPUs
Nvidia…
Use this Compute option with large tabular data, or unstructured data (images, text), where processing may take a lot of time
GPUs
Use this Compute option when using Azure Synapse Analytics or Azure Databricks and you want to distribute your workloads
Spark
Use this VM type when you want a balanced CPU-to-memory ratio (Ideal for testing and development with smaller datasets)
General Purpose
Use this VM type when you want a high memory-to-CPU ratio (Great for in-memory analytics, which is ideal when you have larger datasets or when you’re working in notebooks)
Memory Optimized
nodes…
Spark requires Spark-Friendly languages to make optimal use of Spark’s distributed workload capabilities (Scala, SQL, RSpark or PySpark). If you just use Python, what happens?
Spark distributes through its worker nodes and communicates between them and you with its driver node. Using Python restricts you to just using the driver node…
Monitoring your training time and compute utilization lets you decide on these things
- compute utilization lets you know to scale up or down
- long training times indicate a need for GPUs over CPUs, or going distributed
Going distributed make force a rewrite to Spark-Friendly languages