Scaling NoSQL Databases Flashcards
What is HBase based on?
Google Big Table
In one sentence, what is HBase?
HBase is a column-oriented non-relational database management system that runs on top of Hadoop Distributed File System (HDFS).
How does HBase scale?
HBase system is designed to scale linearly
What query technology does HBase work well with?
HBase works well with Hive, a query engine for batch processing of big data, to enable fault-tolerant big data applications. This is because it comprises of a set of standard tables with rows and columns, much like a traditional database.
What is the structure of an HBase key, value object?
- HBase keys are byte arrays.
- The key is composed of a row key, column family, column qualifier, timestamp, and a delete marker.
1. The row key is the primary identifier and is used to uniquely identify a row in an HBase table.
2. The column family is a way to group related columns together.
3. The column qualifier is the specific column within the column family.
4. The timestamp is associated with each version of a cell to support versioning.
5. The delete marker is used to mark a cell for deletion. - HBase values are also byte arrays.
What is/how does an HBase Cluster work?
In an HBase cluster, data is horizontally partitioned into regions based on row keys, and each region is managed by a separate region server.
The cluster consists of multiple region servers, each responsible for serving a subset of the data.
The Apache ZooKeeper coordinates and manages maintaining metadata, handling failover and monitors health.
The master manages the overall cluster, assigns regions to region servers, handles schema changes and coordinates administrative tasks.
What is Cassandra based on?
Google Big Table
What is Cassandra?
High performance column based Database
How does Cassandra reach performance?
Performance is reached through manual definition of tables and how to store the data during creation
How does Cassandra redundancy differ from HBase?
HBase replicates blocks using Hadoop HDFS, while Cassandra takes care of the replication factor itself using the Gossip protocol
How does Cassandra handle redundancy?
Cassandra achieves redundancy and replication by distributing data across nodes, storing multiple copies (replicas) on different nodes using consistent hashing. ACK sent when replication is done (based on replication factor set)
How does MongoDB handle scaling?
Sharding enables horizontal scaling, dividing data across multiple servers.
The WiredTiger storage engine efficiently manages data storage and retrieval, while the MongoDB query language facilitates flexible data querying.
Indexes enhance query performance.
Replica sets ensure data availability and fault tolerance through data replication across multiple nodes.
What is good and bad with MongoDB replica sets?
Great for total read redundancy
Potential issues with large amounts of writes due to propagation (replication)
In short, what is MongoDB?
Document based NoSQL Database
Data format: JSON
MongoDB’s architecture consists of databases, collections, and documents. Data is organized into flexible, JSON-like documents within collections, and collections are grouped into databases.
What is Redis?
Super fast in-memory key value-based NoSQL database