4. Choosing the Right ML Infrastructure Flashcards

Question

What is the use case for Speech‐to‐Text service to convert recorded audio or streaming audio into text?

Answer 1

Creating subtitles for video recordings and streaming video as well.

Answer 2

Structured data Images/video Natural language Recommendations AI/Retail AI

Answer 3

BigQuery ML and Vertex AI Tables

Answer 4

Deploy the model on an endpoint and serve predictions through REST API.

Answer 5

AUC ROC, AUC ROC, Logloss, Precision at Recall, Recall at Precision

Answer 6

RMSE, RMSLE, MAE

Answer 7

RMSE, RMSLE, MAPE, Quantile loss

Answer 8

Because the hardware used for different AutoML jobs is different.

Answer 9

AutoML Seq2seq+ (takes in a sequence and produces another sequence) Temporal Fusion Transformer (a deep neural network model that also uses the attention mechanism)

Answer 10

Image classification (single): Predict one correct label from a list of labels Image multiclass classification: Predict all the correct labels Image object detection: Predict all the locations of objects Image segmentation: Predict per‐pixel areas of an image with a label. Video classification: Get label predictions for entire videos, shots, and frames. Video action recognition: Identify the action moments in video. Video object tracking: Get labels, tracks, and time stamps for objects

Answer 11

Less memory and low latency

Answer 12

Text classification: Predict the one correct label Multi-label classification: Predict all the correct labels Entity extraction: Identify entities within your text items. Translation: Convert text from source language to target language.

Answer 13

Upload the product catalog (product, photos, and other metadata) Define “user events” (what the customer clicks, views, and buys) Recommendations AI uses this data to create models for giving recommendations.

Answer 14

Trained on reference images of products in your catalog, which can then be searched using an image.

Answer 15

Others you may like (product page, customer behavior and product relevance, click-through rate) Frequently bought together (checkout page, shopping cart items, revenue per order) Recommended for you (home page, user viewing history, click-through rate) Similar items (product page, product catalog, click-through rate)

Answer 16

Document AI processor is an interface and do general processing, specialized processing (procurement, identity, lending, and contract documents), and custom processing (provide your own labeled set of documents). Document AI Warehouse is a platform to store, search, organize, govern, and analyze documents along with their structured metadata.

Answer 17

Detect document quality Deskew Extract text and layout information Identify and extract key/value pairs Extract and normalize entities Split and classify documents Review documents (human in the loop) Store, search, and organize documents (Document AI Warehouse)

Answer 18

Dialogflow is a conversational AI offering from Google Cloud that provides chatbots and voicebots.

Answer 19

Agent Assist can provide support by identifying intent and providing ready‐to‐send responses and answers from a centralized knowledge base as well as transcript calls in real time.

Answer 20

Use natural language processing to call drivers (invoke driver software) Measure sentiment to help leadership understand the call center operations

Answer 21

Support multichannel communications between customers and agents.

Answer 22

A GPU loads a block of memory and applies some operation using the thousands of ALUs in parallel, thereby making it faster.

Answer 23

A2 and N1 Machine series

Answer 24

NVIDIA_TESLA_A100

Answer 25

Type of GPU: machineSpec.acceleratorType field in WorkerPoolSpec The number of GPUs: machineSpec.acceleratorCount field in VM (worker pool).

Answer 26

Not all types of GPUs are available in all regions. Use two or four NVIDIA TESLA_T4 GPUs on a VM but not three. The GPU configuration must have sufficient virtual CPUs and memory compared to the machine type that goes with it.

Answer 27

Each TPU has multiple matrix multiply units (MXUs). Each MXU has 128 × 128 multiply/accumulators. Each MXU is capable of performing 16,000 multiply‐accumulate operations in each cycle using the bfloat16 number format.

Answer 28

A single TPU device A TPU Pod (a group of TPU devices connected by high‐speed interconnects) A TPU slice (a subdivision of a TPU Pod) A TPU VM

Answer 29

Rapid prototyping that needs flexibility Models that train fast Small models that work with small batch size Custom TensorFlow operations written in C++ Limited by available I/O or the networking bandwidth of the host

Answer 30

Models for which source code does not exist or is too tedious to change Models with a significant number of custom TensorFlow operations so they need to run at least partially on a CPU Models with TensorFlow ops that are not available on TPUs Medium‐to‐large models with medium‐sized batch

Answer 31

Models that have a majority of matrix computations Models that have no custom TensorFlow operations Models that train for weeks or months Large and very large models with very large effective batch sizes

Answer 32

Programs that require frequent branching (conditional) and dominated element‐wise by algebra. Sparse data (data that has lot of zeros) High precision is not well suited for TPUs. Deep neural networks that contain custom TensorFlow operations written in C++, especially if the custom operations in the main training loop.

Answer 33

The main bottleneck when using TPUs is the data transfer between the Cloud TPU and host memory.

Answer 34

online (real-time) and batch (reasonable time).

Answer 35

Scaling behavior and machine type

Answer 36

CPU, GPU and memory

Answer 37

TensorFlow SavedModel or custom container designed to take advantage of GPUs. Not for scikit‐learn or XGBoost models. GPUs are not available in some regions. Use only one type of GPU DeployedModel resource or BatchPredictionJob Limited on the number of GPUs you can add depending on machine types

Answer 38

Deploy that container as a docker container to a Compute Engine instance directly Benchmark the instance by calling prediction calls until the instance hits 90+ percent CPU utilization. Determine the queries per second (QPS) cost per hour of different machine types.

Answer 39

The Google‐designed Edge TPU coprocessor accelerates ML inference on these edge devices. A single Edge TPU can perform 4 trillion operations per second (4 TOPS), on just 2 watts of power. This is sold under the brand name of Coral.ai. Edge TPU is for running ML inferences on edge devices which usually have limited bandwidth and may operate offline.

Answer 40

You can train your ML model on Google Cloud (AutoML or a custom model), and deploy the model into your Android or iOS app. The prediction happens in the device (low response times and enable offline prediction).

4. Choosing the Right ML Infrastructure Flashcards

(64 cards)