Bigtable Flashcards
When would you use Cloud SQL vs Datastore vs Bigtable?
- Cloud SQL for relational db with tables, views and indices, stored procedures with custom views and joins. Read and write data. MySQL limitations.
Size, structure, analysis and interface determine which of the two NOSQL options:
- Datastore for a few GB of data, even growing to 1 TB. Structured storage representing objects, document database.
- Bigtable for huge sets of data (starting at 1 TB).
- Bigtable for analytics at scale while database is still taking requests. Run MapReduce on production data without copying it. Integration with other Big Data tools, like Hadoop.
What is Bigtable?
Cloud Bigtable is a sparsely populated table that can scale to billions of rows and thousands of columns, allowing you to store terabytes or even petabytes of data. A single value in each row is indexed; this value is known as the row key. Cloud Bigtable is ideal for storing very large amounts of single-keyed data with very low latency. It supports high read and write throughput at low latency, and it is an ideal data source for MapReduce operations.
- It integrates with the existing Apache ecosystem of open-source Big Data software.
What are some advantages of using Bigtable over self-managed HBase?
1) Scalability: scales in dirfect proportion to the number of machines in your cluster.
2) Simple administration: updates and restarts transparently with high data durability. No managing of masters, regions, clusters or nodes.
3) Cluster resizing without downtime: takes minutes, dynamicically balances performance
What is Bigtable good for?
Cloud Bigtable is ideal for applications that need very high throughput and scalability for non-structured key/value data, where each value is typically no larger than 10 MB. Cloud Bigtable also excels as a storage engine for batch MapReduce operations, stream processing/analytics, and machine-learning applications.
Types of data it’s good for:
- Marketing data: purchase histories, customer preferences
- Financial data: tansaction histories, stock prices, exchance rates
- IoT data: usage reports from meters and home appliances
- Time-series data: cpu and memory usage over time
What is the Bigtable storage model?
Cloud Bigtable stores data in massively scalable tables, each of which is a sorted key/value map. The table is composed of rows, each of which typically describes a single entity, and columns, which contain individual values for each row. Each row is indexed by a single row key, and columns that are related to one another are typically grouped together into a column family. Each column is identified by a combination of the column family and a column qualifier, which is a unique name within the column family.
What are some good vs bad choices for the Row Key in Bigtable?
Good:
- reversed domain names like com.company.product
- string identifiers and identifier combos with a timestamp
Avoid:
- domain names
- squential numeric id’s - like userId. Use of a ‘reveresed’ userid will allow the load to be spread more evenly, since using the numeric in order will push more traffic to the more recently added, or active, users
- static, repeatedly updated identifiers = rows that update very frequently (per second, etc)
- hashed values - Instead, use human-readable values