Databases & Analytics Flashcards
What are the four types of Cloud Storage class and what are their detailed cost structures?
Standard
Nearline
Coldline
Archive
What is strong transactional consistency and which databases on GCP offer strong transactional consistency?
Strong transactional consistency means when changes are made or updated to a DB, the changes take place across all replicas / shards immediately.
Cloud Spanner and Cloud Firestore are two examples of database services with strong transactional consistency.
It is a requirement of your application that even if the database is distributed across multiple nodes, writes to the database need to be replicated to all nodes before any reads to the data are allowed. What is the name of this concept?
Strong transactional consistency.
You need a fully managed data warehouse for analytics datasets with millions of rows, but your analytics team must be able to query it using standard SQL statements. Which GCP product should you choose?
BigQuery
Which Cloud Storage class would you choose for backups that need to be kept for at least 90 days and will only be accessed in a disaster recovery scenario?
Coldline storage
Your company is streaming real-time sensor data into Bigtable. You will be required to run searches of the data based on the “SensorID” of the sensor and a time window. Bearing in mind that Bigtable sorts its rows lexicographically, what would be a sensible row key design?
a) SenorID#TimeStamp
b) TimeStamp#SensorID
c) TimeStamp
d) SensorID
a) SenorID#TimeStamp
You need to store collections of JSON documents in a managed database service. Which GCP product should you choose?
Cloud Firestore
You need to store millions of rows of wide-column NoSQL time-series data in a managed database service. Which GCP product should you choose?
Cloud Bigtable
Your team has developed a mobile web application where global users vote on popular topics. For each topic, you expect a very high volume of votes during each individual 30-minute voting window. You need to capture and count all votes within 24 hours and then store the votes for future analysis and reporting. What should you do?
A. Save the votes to Memorystore, and use Cloud Functions to insert the data into BigQuery. Display the results in Google Data Studio or Looker.
B. Publish the votes to Pub/Sub, and use a Datafow pipeline to insert the data into BigQuery. Display the results in Google Data Studio or Looker.
C. Publish the votes to Pub/Sub, and use Cloud Functions to insert the data into Cloud Storage. Display the results in Google Data Studio or Looker.
D. Use Firebase to authenticate the mobile users, and publish the data directly to the database. Export the data to a CSV file, and import it into Sheets for reporting.
B. Correct. Pub/Sub supports the ingestion of millions of records per second and guarantees the delivery of the messages. BigQuery should be used for analysis.
Your company has a successful multi-player game that has become popular in the US. Now, it wants to expand to other regions. It is launching a new feature that allows users to trade points. This feature will work for users across the globe. Your company’s current MySQL backend is reaching the limit of the Compute Engine instance that hosts the game. Your company wants to migrate to a different database that will provide global consistency and high availability across the regions. Which database should they choose?
Cloud Spanner
You are creating a new web application for a global audience. You need to choose a database service specifically for storing user sessions. Your users may connect from any location in the world, and transactions should be strongly consistent. As this is a new application, you would like to keep costs down where possible. Which database should you choose?
Cloud Firestore
Your company plans to expand their analytics use cases. One of the new use cases requires your data analysts to analyze events using SQL on a near real–time basis. You expect rapid growth and want to use managed services as much as possible. What should you do?
Create a Pub/Sub topic and a subscription. Stream your events from the source into the Pub/Sub topic. Leverage Dataflow to ingest these events into BigQuery for SQL analysis.
You have two tables in Cloud SQL with identical columns that you need to quickly combine into a single table, removing duplicate rows from the result set. What should you do?
A. Use the JOIN operator in SQL to combine the tables.
B. Use nested WITH statements to combine the tables.
C. Use the UNION operator in SQL to combine the tables.
D. Query the tables from a Linux shell, combine the results into a single CSV, and re-import the rows into the database. Use the UNION ALL operator in SQL to combine the tables.
C is correct because the UNION operator combines result sets while removing duplicates.
You are building a storage layer for an analytics Hadoop cluster in a region for your company. This cluster will run multiple jobs on a nightly basis, and you need to access the data frequently. You want to use Cloud Storage for this purpose. What is the most cost effective option?
A. Regional Coldline storage
B. Regional Nearline storage
C. Regional Standard storage
D. Multi-regional Standard storage
C. Regional Standard storage
You have a data warehouse built on BigQuery that contains a table with array fields. To analyze the data for a specific use case using Standard SQL, you need to read all elements from the array and write them with all other non-array fields in a table. You don’t want to lose any records if they don’t match records in the array fields. What should you do?
A. Perform SELECT * FROM tablename.
B. Perform UNNEST and JOIN with the table to get these results.
C. Perform UNNEST and INNER JOIN with the table to get these results.
D. Perform UNNEST and CROSS JOIN with the table to get these results.
D is correct because it does not lose records when the join is performed.