Data Preprocessing Flashcards
What are other ways to evaluate an AutoML model?
Precision, Recall, Confusion Matrix, use Precision-Recall curve to decide score threshold.
Difference between Colab Enterprise vs Vertex AI Notebook
Colab Enterprise: A collaborative, managed notebook environment with security and compliance capabilities of Google Cloud. Choose this if your project’s priorities are collaboration and avoiding infrastructure management.
Vertex AI Workbench: A Jupyter notebook-based environment provided through VM instances supporting the entire data science workflow. Choose this if your project’s priorities are control and customizability.
What platforms or features does Vertex AI Workbench support?
Importing conda environments, access data from Cloud Storage or BigQuery, automated notebook runs and idle shutdown, custom containers, third party credentials, monitoring instance, full control over infrastructure.
What is Memorystore?
Fully managed Redis and Memcached for sub-millisecond data access.
What is Firestore?
Highly-scalable, massively popular document database service for mobile, web, and server development that offers richer, faster queries and high availability up to 99.999%. Stores documents.
What is Bigtable?
Highly performant, fully managed NoSQL database service for large analytical and operational workloads. Stores key-values and supports migrating from Hadoop or Spark.
What is Cloud SQL?
Fully managed MySQL, PostgreSQL, and SQL Server. Simplifies migrations to Cloud SQL from MySQL, PostgreSQL, and Oracle databases with Database Migration Service.
What is Spanner?
Cloud-native with unlimited scale, global consistency, and up to 99.999% availability. Stores structured data with horizontal scalability of unstructured data. Use cases include Gaming, Retail, Global financial ledger, Supply chain/inventory management.
What is BigQuery?
Serverless, highly scalable, and cost-effective multicloud data warehouse designed for business agility, offering up to 99.99% availability.
Why Cloud SQL over BigQuery?
Cloud SQL is a storage solution for low-latency transactional operations (write-heavy), while BigQuery is an analytics solution for analyzing databases and generating reports (read-heavy).
What is Datastream?
Capture and replicate data from MySQL, PostgreSQL, AlloyDB, SQL Server, and Oracle databases into Google Cloud services. Serverless, no need to manage instances.
In Data Preprocessing, when should you use BigQuery?
When handling tabular or structured data.
In Data Preprocessing, when should you use Dataflow?
When handling unstructured data.
In Data Preprocessing, when should you use TensorFlow Extended?
When you want to use the TensorFlow ecosystem.
What would you use Cloud Storage for?
For storing images, videos, audio, and other unstructured data in large container formats.