Chapter 8: Storage, Databases, and Data Analytics Flashcards

Question 1

Q

You’re an architect for Memegen, a global meme generating company, designing a real-time analytics platform that is intended to stream millions of events per second. Their platform needs to dynamically scale up and down, process incoming data on the fly, and process data that arrives late because of slow mobile networks. They also want to run SQL queries to access 10TB of historical data. They’d like to stick to managed services only. What services would you leverage?

Cloud SQL, Cloud Pub/Sub, Kubernetes
Cloud Functions, Cloud Dataproc, Cloud Bigtable
Cloud Dataflow, Cloud Storage, Cloud Pub/Sub, BigQuery
Cloud Dataproc, BigQuery, Google Compute Engine

Answer

A

You’re an architect for Memegen, a global meme generating company, designing a real-time analytics platform that is intended to stream millions of events per second. Their platform needs to dynamically scale up and down, process incoming data on the fly, and process data that arrives late because of slow mobile networks. They also want to run SQL queries to access 10TB of historical data. They’d like to stick to managed services only. What services would you leverage?
* Cloud Dataflow, Cloud Storage, Cloud Pub/Sub, BigQuery

The requirements for this architecture include dynamic scaling, streaming data on the fly and batch processing data that arrives late, and using SQL to query massive scales of batch data, all through a managed service. Dataflow, GCS, Pub/Sub, and BigQuery are the only solutions that meet all these requirements.

Question 2

Q

You need a solution to analyze your data stream and optimize your operations. Your data stream involves both batch and stream processing. Your team wants to leverage a serverless solution. What should you use?

Cloud Dataflow
Cloud Dataproc
Kubernetes with BigQuery
Compute Engine with BigQuery

Answer

A

You need a solution to analyze your data stream and optimize your operations. Your data stream involves both batch and stream processing. Your team wants to leverage a serverless solution. What should you use?
* Cloud Dataflow

Dataflow is a serverless solution that can be leveraged for both batch and stream processing. Dataproc is not fully serverless.

Question 3

Q

Your team is running many Apache Spark and Hadoop jobs in their on-premises environment and would like to migrate to the cloud with the least amount of change to their tooling. What should they use?

Cloud Dataflow
Compute Engine with a Dataflow Connector
Kubernetes Engine with a Dataflow Connector
Cloud Dataproc

Answer

A

Your team is running many Apache Spark and Hadoop jobs in their on-premises environment and would like to migrate to the cloud with the least amount of change to their tooling. What should they use?
* Cloud Dataproc

Dataproc is designed for Spark and Hadoop workloads.

Question 4

Q

You need to develop a solution that will process data from one of your organization’s APIs in strict chronological order with no repeated data. How would you build this solution?

Cloud Dataflow
Cloud Pub/Sub to a Cloud SQL backend
Cloud Pub/Sub to a Stackdriver backend
Cloud Pub/Sub

Answer

A

You need to develop a solution that will process data from one of your organization’s APIs in strict chronological order with no repeated data. How would you build this solution?
* Cloud Pub/Sub to a Cloud SQL backend

Pub/Sub offers first in, first out (FIFO) ordering of messages, but when the content is stored, it will need to be stored in an ACID-based system such as Cloud SQL.

Question 5

Q

Memegen just got breached, and the Security Operations team is kicking off their incident response process. They’re investigating a production VM and want to copy the VM as evidence in a secure location so they can conduct their forensics before taking an action. What should they do?

Create a snapshot of the root disk, create a restricted GCS bucket that is accessible only by the forensics team, and create an image file in GCS from the snapshot.
Shut down the VM, create a snapshot, create an image file in GCS, and restrict the GCS bucket.
Use the gcloud copy tool to copy the file directory onto an attached Cloud Filestore network file system.
Create a clone of the VM, migrate user traffic onto the new VM, and use the old VM for forensics.

Answer

A

Memegen just got breached, and the Security Operations team is kicking off their incident response process. They’re investigating a production VM and want to copy the VM as evidence in a secure location so they can conduct their forensics before taking an action. What should they do?
* Create a snapshot of the root disk, create a restricted GCS bucket that is accessible only by the forensics team, and create an image file in GCS from the snapshot.

This is the only valid solution here. They’re looking to investigate a production VM, so taking the server down is not a recommended action at this point. They also want to conduct forensics in a secure location to ensure the evidence is not tampered with.

Question 6

Q

You’re planning on migrating 5 petabytes of data to your project. This data requires 24/7 availability, and your data analyst team is familiar with SQL. What tool should you use to surface this data to your analyst team for analytical purposes?

Cloud Datastore
Cloud SQL
Cloud Spanner
BigQuery

Answer

A

You’re planning on migrating 5 petabytes of data to your project. This data requires 24/7 availability, and your data analyst team is familiar with SQL. What tool should you use to surface this data to your analyst team for analytical purposes?
* BigQuery

There are a few indicators here as to why BigQuery is the right answer: large-scale migration, requirement to use SQL, and an analytical use case.

Question 7

Q

You’re consulting for an IoT company that has hundreds of thousands of IoT sensors that capture readings every two seconds. You’d like to optimize the performance of this database, so you’re looking to identify a more accurate, time-series database solution. What would you use?

Cloud Bigtable
Cloud Storage
BigQuery
Cloud Filestore

Answer

A

You’re consulting for an IoT company that has hundreds of thousands of IoT sensors that capture readings every two seconds. You’d like to optimize the performance of this database, so you’re looking to identify a more accurate, time-series database solution. What would you use?
* Cloud Bigtable

The dead giveaway here is leveraging a time-series database for IoT sensors. This is where Bigtable shines.

Question 8

Q

You have a customer who wants to store data for at least ten years that will be accessed infrequently, at most once a year. The customer wants to optimize their cost. What solution should they use?

Google Cloud Storage
Google Cloud Storage with a Nearline storage class
Google Cloud Storage with a Coldline storage class
Google Cloud Storage with a Archive storage class

Answer

A

You have a customer who wants to store data for at least ten years that will be accessed infrequently, at most once a year. The customer wants to optimize their cost. What solution should they use?
* Google Cloud Storage with a Archive storage class

Using an archival storage class will be sufficient and the most cost-effective here because the use case is infrequently accessing the data, at most once a year.

Question 9

Q

BankyBank wants to build an online transactional processing tool that requires a relational database with petabyte-scale data. What tool would you use?

BigQuery
Cloud SQL
Cloud Spanner
Cloud Bigtable

Answer

A

BankyBank wants to build an online transactional processing tool that requires a relational database with petabyte-scale data. What tool would you use?
* Cloud Spanner

Cloud Spanner is the OLTP solution that is relational and offers petabyte scalability. Cloud SQL is not designed for petabyte-scale data.

Question 10

Q

Memegen wants to introduce a shopping functionality for their users to connect all of their user purchasing history and activities to their user profiles. They need massive scalability with high performance, atomic transactions, and a highly available document database. What should they use?

Cloud Spanner
BigQuery
Cloud Bigtable
Cloud Firestore

Answer

A

Memegen wants to introduce a shopping functionality for their users to connect all of their user purchasing history and activities to their user profiles. They need massive scalability with high performance, atomic transactions, and a highly available document database. What should they use?
* Cloud Firestore

Cloud Firestore, formerly known as Datastore, is a great solution for profile storage and purchasing history.

Question 11

Q

When you’re taking the exam, knowing which Google Cloud storage technologies are related to file, object, and block storage may help you get to a more clear answer.

Answer

A

Be careful, though, and don’t assume a Google-managed service is always the answer. Read through each question very carefully for the requirements.

Question 12

Q

Persistant Disk

If you need to modify the size of your persistent disk, it’s as easy as increasing the size in the Cloud Console. If you need to resize your mounted file system, you can use the standard resize2fs command in Linux to do online resizing.

Answer

A

PDs are not actually physically attached to the servers that host your VMs, but they are virtually attached. You can only resize up, but not down!

Question 13

Q

Persistant Disk

The command to modify the persistent disk auto-delete behavior for instances attached to VMs is gcloud compute instances set-disk-auto-delete.

Answer

A

Auto-delete is on by default, so you will need to turn this syntax off if you don’t want your PD to be deleted when the instance attached to it is deleted.

Question 14

Q

Local SSD

Local SSDs disappear when you stop an instance, whereas all three types of persistent disks persist when you stop an instance—hence the name, persistent disk.

Answer

A

Each Local SSD is only 375GB, but you can attach 24 Local SSDs per instance. Because of their benefits and limitations, Local SSDs make a great use case for temporary storage such as caches, processing space, or low-value data.

Chapter 8: Storage, Databases, and Data Analytics Flashcards

(14 cards)