Scaling NoSQL Databases Flashcards

Question 1

Q

What is HBase based on?

Answer

A

Google Big Table

Question 2

Q

In one sentence, what is HBase?

Answer

A

HBase is a column-oriented non-relational database management system that runs on top of Hadoop Distributed File System (HDFS).

Question 3

Q

How does HBase scale?

Answer

A

HBase system is designed to scale linearly

Question 4

Q

What query technology does HBase work well with?

Answer

A

HBase works well with Hive, a query engine for batch processing of big data, to enable fault-tolerant big data applications. This is because it comprises of a set of standard tables with rows and columns, much like a traditional database.

Question 5

Q

What is the structure of an HBase key, value object?

Answer

A

HBase keys are byte arrays.
The key is composed of a row key, column family, column qualifier, timestamp, and a delete marker.
1. The row key is the primary identifier and is used to uniquely identify a row in an HBase table.
2. The column family is a way to group related columns together.
3. The column qualifier is the specific column within the column family.
4. The timestamp is associated with each version of a cell to support versioning.
5. The delete marker is used to mark a cell for deletion.
HBase values are also byte arrays.

Question 6

Q

What is/how does an HBase Cluster work?

Answer

A

In an HBase cluster, data is horizontally partitioned into regions based on row keys, and each region is managed by a separate region server.
The cluster consists of multiple region servers, each responsible for serving a subset of the data.
The Apache ZooKeeper coordinates and manages maintaining metadata, handling failover and monitors health.
The master manages the overall cluster, assigns regions to region servers, handles schema changes and coordinates administrative tasks.

Question 7

Q

What is Cassandra based on?

Answer

A

Google Big Table

Question 8

Q

What is Cassandra?

Answer

A

High performance column based Database

Question 9

Q

How does Cassandra reach performance?

Answer

A

Performance is reached through manual definition of tables and how to store the data during creation

Question 10

Q

How does Cassandra redundancy differ from HBase?

Answer

A

HBase replicates blocks using Hadoop HDFS, while Cassandra takes care of the replication factor itself using the Gossip protocol

Question 11

Q

How does Cassandra handle redundancy?

Answer

A

Cassandra achieves redundancy and replication by distributing data across nodes, storing multiple copies (replicas) on different nodes using consistent hashing. ACK sent when replication is done (based on replication factor set)

Question 12

Q

How does MongoDB handle scaling?

Answer

A

Sharding enables horizontal scaling, dividing data across multiple servers.
The WiredTiger storage engine efficiently manages data storage and retrieval, while the MongoDB query language facilitates flexible data querying.
Indexes enhance query performance.
Replica sets ensure data availability and fault tolerance through data replication across multiple nodes.

Question 13

Q

What is good and bad with MongoDB replica sets?

Answer

A

Great for total read redundancy
Potential issues with large amounts of writes due to propagation (replication)

Question 14

Q

In short, what is MongoDB?

Answer

A

Document based NoSQL Database
Data format: JSON

MongoDB’s architecture consists of databases, collections, and documents. Data is organized into flexible, JSON-like documents within collections, and collections are grouped into databases.

Question 15

Q

What is Redis?

Answer

A

Super fast in-memory key value-based NoSQL database

Question 16

Q

What are 4 key features of Redis?

Answer

Study These Flashcards

A

Transactions
Pub/Sub
Keys with a limited time-to-live
Automatic failover

Question 17

Q

What is a typical use-case for Redis?

Answer

Study These Flashcards

A

Often used to synchronize states between Kubernetes Pods as it is fast, resilient, and configurable.

Question 18

Q

Is a Redis database scalable?

Answer

Study These Flashcards

A

Yes. It can be used in many ways, including:
- A simple, singular DB
- A HA DB, that has 1 or more replicas
- A Clustered DB, that is several partitioned DB’s
- A HA Clustered DB, that has replicas of the partitions.

Question 19

Q

What file formats are stored in the HDFS when using HBase?

Answer

Study These Flashcards

A

When using HBase, it always stores the files in HFile-format, this means that you don’t need to worry about whether to use Parquet or Avro.

Scaling NoSQL Databases Flashcards

(19 cards)