5. Architecting ML Solutions Flashcards
What are the ML pipeline steps?
Data collection
Data transform
Model training
Model tuning
Model deploying
Model monitoring
What GCP services are available for different steps in a ML workflow?
Data collection: Google Cloud storage, Pub/Sub (streaming data), BigQuery
Data transform: Dataflow
Model training: Custom models (Vertex AI Training and Vertex AutoML)
Model tuning and experiment tracking: Vertex AI hyperparameter tuning and Vertex AI Experiments
Model deploying and monitoring: Vertex AI Prediction and Vertex AI Model Monitoring
Orchestration and CI/CD: Vertex AI Pipelines
Explanations and responsible AI: Vertex Explainable AI, model cards
Hints: Curious Turtle Takes Trip, Explores Ocean Depths
What are the three layers of Google Cloud ML services and solutions?
Top: Software as a Service (Document AI, Contact Center AI, and Enterprise Translation Hub), no code.
Middle: Vertex AI pretrained APIs (sight, language, conversation, and structured data), serverless and scalable. AutoML, Workbench.
Bottom: Compute instance and containers (GKE) with choices of TPU, GPU and storage. Need to manage the infra for scalability and reliability.
When to use BigQuery ML?
Tabular dataset
Use SQL
Models available in BigQuery ML
What data types can AutoML support and what kind of jobs it can do?
Text, video, images, and tabular dataset
Support classification, object detection, sentiment analysis, and translation
When to use custom-trained models?
Cases can’t fit into BigQuery ML or AutoML or your models are already available in other platforms.
What data stores are available in GCP?
Google Cloud Storage
BigQuery
Vertex AI’s datasets to manage training and annotation sets
Vertex AI Feature Store
NoSQL data store
Hints: Cats Bounce Very Frequently Nearby
What can Google Cloud Storage store?
Image, video, audio, and unstructured data
Where do you store your tabular data in Bigquery if you want to have better speed? View or Table?
Table
What are the BigQuery functionalities available?
Google Cloud console
BigQuery command‐line tool
BigQuery REST API
Vertex AI Jupyter Notebooks using BigQuery Magic or BigQuery Python client.
What are the four types of data supported by Vertex AI managed dataset?
Image, video, tabular (CSV, BigQuery tables), and text
What Google Cloud Tools can be used to read BigQuery data?
tf.data.dataset reader for BigQuery and tfio.BigQuery.BigQueryClient() for Tensorflow or Keras
BigQuery client for TFX
BigQuery I/O connector for Dataflow
BigQuery Python Client library for reading from any other framework
What are the advantages for using managed datasets?
Manage datasets in a central location (structured and unstructured data)
Easy to track lineage to models for governance and iterative development.
Compare model performance by training AutoML and custom models using the same datasets.
Generate data statistics and visualizations.
Automatically split data into training, test, and validation sets.
Hints: Cats In Parks Sleep Soundly
In what situation you don’t want to use managed datasets?
You want more control over splitting your data in your training code
Lineage between your data and model isn’t critical to your application.
What is Vertex AI Feature Store?
Vertex AI Feature Store is a fully managed centralized repository for organizing, storing, and serving ML features.