5. Architecting ML Solutions Flashcards by KK Cheng

What are the ML pipeline steps?

Data collection
Data transform
Model training
Model tuning
Model deploying
Model monitoring

How well did you know this?

Not at all

Perfectly

What GCP services are available for different steps in a ML workflow?

Data collection: Google Cloud storage, Pub/Sub (streaming data), BigQuery
Data transform: Dataflow
Model training: Custom models (Vertex AI Training and Vertex AutoML)
Model tuning and experiment tracking: Vertex AI hyperparameter tuning and Vertex AI Experiments
Model deploying and monitoring: Vertex AI Prediction and Vertex AI Model Monitoring
Orchestration and CI/CD: Vertex AI Pipelines
Explanations and responsible AI: Vertex Explainable AI, model cards

Hints: Curious Turtle Takes Trip, Explores Ocean Depths

How well did you know this?

Not at all

Perfectly

What are the three layers of Google Cloud ML services and solutions?

Top: Software as a Service (Document AI, Contact Center AI, and Enterprise Translation Hub), no code.
Middle: Vertex AI pretrained APIs (sight, language, conversation, and structured data), serverless and scalable. AutoML, Workbench.
Bottom: Compute instance and containers (GKE) with choices of TPU, GPU and storage. Need to manage the infra for scalability and reliability.

How well did you know this?

Not at all

Perfectly

When to use BigQuery ML?

Tabular dataset
Use SQL
Models available in BigQuery ML

How well did you know this?

Not at all

Perfectly

What data types can AutoML support and what kind of jobs it can do?

Text, video, images, and tabular dataset
Support classification, object detection, sentiment analysis, and translation

How well did you know this?

Not at all

Perfectly

When to use custom-trained models?

Cases can’t fit into BigQuery ML or AutoML or your models are already available in other platforms.

How well did you know this?

Not at all

Perfectly

What data stores are available in GCP?

Google Cloud Storage
BigQuery
Vertex AI’s datasets to manage training and annotation sets
Vertex AI Feature Store
NoSQL data store

Hints: Cats Bounce Very Frequently Nearby

How well did you know this?

Not at all

Perfectly

What can Google Cloud Storage store?

Image, video, audio, and unstructured data

How well did you know this?

Not at all

Perfectly

Where do you store your tabular data in Bigquery if you want to have better speed? View or Table?

Table

How well did you know this?

Not at all

Perfectly

What are the BigQuery functionalities available?

Google Cloud console
BigQuery command‐line tool
BigQuery REST API
Vertex AI Jupyter Notebooks using BigQuery Magic or BigQuery Python client.

How well did you know this?

Not at all

Perfectly

What are the four types of data supported by Vertex AI managed dataset?

Image, video, tabular (CSV, BigQuery tables), and text

How well did you know this?

Not at all

Perfectly

What Google Cloud Tools can be used to read BigQuery data?

tf.data.dataset reader for BigQuery and tfio.BigQuery.BigQueryClient() for Tensorflow or Keras
BigQuery client for TFX
BigQuery I/O connector for Dataflow
BigQuery Python Client library for reading from any other framework

How well did you know this?

Not at all

Perfectly

What are the advantages for using managed datasets?

Manage datasets in a central location (structured and unstructured data)
Easy to track lineage to models for governance and iterative development.
Compare model performance by training AutoML and custom models using the same datasets.
Generate data statistics and visualizations.
Automatically split data into training, test, and validation sets.

Hints: Cats In Parks Sleep Soundly

How well did you know this?

Not at all

Perfectly

In what situation you don’t want to use managed datasets?

You want more control over splitting your data in your training code
Lineage between your data and model isn’t critical to your application.

How well did you know this?

Not at all

Perfectly

What is Vertex AI Feature Store?

Vertex AI Feature Store is a fully managed centralized repository for organizing, storing, and serving ML features.

How well did you know this?

Not at all

Perfectly

What Google service can you use when you have unlabeled and unstructured data?

Study These Flashcards

You can use the Vertex AI data labeling service to label the data in Google Cloud Storage or Vertex AI–managed datasets.

What are the three NoSQL data store options?

Study These Flashcards

Memorystore (submillisecond, limited but fast changing data), Datastore (millisecond, slow changing data, automatic scaling) and Bigtable (millisecond, dynamically changing data, large amount of data)

Why should you avoid storing data in block storage such as a Network File System (NFS) or a virtual machine (VM) hard disk?

Study These Flashcards

It’s harder to manage and tune performance than in Google Cloud Storage or BigQuery.

Why should you avoid reading data directly from databases such as Cloud SQL?

Study These Flashcards

You should store data in BigQuery, Google Cloud Storage, or a NoSQL data store for performance.

What is Kubeflow?

Study These Flashcards

Kubeflow is an open source Kubernetes framework for developing and running portable ML workloads.
Kubeflow Pipelines lets you compose, orchestrate, and automate ML systems.
You can choose to deploy your Kubernetes workloads locally, on‐premises, or to a cloud environment
Kubeflow Pipelines is flexible.

What is Vertex AI Pipeline?

Study These Flashcards

A managed service automates, monitors, and governs your ML systems by orchestrating your ML workflow in a serverless manner
Store your workflow’s artifacts using Vertex ML Metadata. You can analyze the lineage of your workflow’s artifacts (training data, hyperparameters, and code).
Vertex AI Pipelines can run pipelines built using the Kubeflow Pipelines SDK or TensorFlow Extended.

Can Vertex AI Pipeline support TensorFlow and other frameworks?

Study These Flashcards

For TensorFlow, use TensorFlow Extended to define your pipeline and the operations, then execute it on Vertex AI’s serverless pipelines system.
For all other frameworks, use Kubeflow Pipelines with Vertex AI Pipelines. Use Vertex AI to launch and interact with the platform.

Does Vertex AI Pipelines support Kubeflow experiments?

Study These Flashcards

Yes, Vertex AI Pipelines supports experiments

What are the two outcomes of using Vertex AI Pipeline?

Study These Flashcards

You can use pipelines regardless of the ML environment you choose.
You need a small number of nodes with modest CPU and RAM since most work will happen within a managed service.

What does Kubeflow Pipelines components support?

TensorFlow Extended related services on GCP, Dataproc for Spark ML jobs, AutoML and other compute workloads. Hints: Xylophones Dance At Opera.

In what siutation using TensorFlow Extended SDK is recommended?

You already use TensorFlow. You use structured and textual data. You work with a lot of data.

What is the recommended way to build your pipeline if you use TensorFlow in an ML workflow that processes terabytes of structured data or text data?

TFX creates a directed acyclic graph (DAG) of your ML pipeline. It uses Apache Beam under the hood for managing and implementing pipelines, and this can be easily executed on distributed processing backends like Apache Spark, Google Cloud Dataflow, and Apache Flink. Orchestrators like Kubeflow make it easy to configure, operate, monitor, and maintain ML pipelines. You can use Kubeflow Pipelines to schedule and orchestrate your TFX pipeline. For other use cases, we recommend that you build your pipeline using the Kubeflow Pipelines SDK. While you could consider other orchestrators like Cloud Composer, Vertex AI Pipelines is a better choice because it includes built‐in support for common ML operations and tracks ML–specific metadata and lineage. Lineage is especially important for validating that your pipelines are operating correctly in production.

What are the two ways to make online prediction?

Synchronous: 1. Use Vertex AI online predictions to deploy your model as a real-time HTTPS endpoint. 2. Use App Engine or GKE as an ML gateway to perform preprocessing before sending your request from client applications. Asynchronous: 1. Push: Get notification 2. Poll: Periodically polls for prediction availability

What are the two ways to minimize latency?

Minimize latency at the model level (e.g., fewer layers in DNN) Minimize latency at the serving level (low latency data store, precomputing predictions and caching)

5. Architecting ML Solutions Flashcards

(29 cards)