6. Partitioning Flashcards

Question 1

Q

Partitioning

Answer

A

Normally, partitions are defined in such a way that each piece of data (each record, row, or document) belongs to exactly one partition.

Question 2

Q

Purpose of partitioning

Answer

A

The main reason for partition data is scalability. Different partitions can be placed on different nodes in a shard-nothing cluster.

Question 3

Q

Partitioning and replication

Answer

A

A node may store more than one partition.

Combining replication and partitioning: each node acts as leader for some partitions and follower for other partitions.

Question 4

Q

Partitioning approaches - 1. Key range partitioning

Answer

A

Keys are sorted. A partition owns all the keys from some minimum up to some maximum.
Pros: Sorting enable efficient range queries
Cons: Risk of hot spots if application often accesses keys that are close together in the sorted order

In this approach, partitions are typically rebalanced dynamically by splitting the range into two subranges when a partition gets too big.

Question 5

Q

Partitioning approaches - 2. Hash partitioning

Answer

A

A hash function is applied to each key, and a partition owns a range of hashes.

Pros: Distribute load more evenly
Cons: Make range queries inefficient

When partitioning by hash, it is common to create a fixed number of partitions in advance, to assign several partitions to each node, and to move entire partitions from one node to another when nodes are added or removed. Dynamic partitioning can slo be used.

Question 6

Q

Partitioning and secondary indexes - 1. Document-partitioned indexes

Answer

A

Secondary indexes are stored in the same partition as the primary key and value.

Pros: Only a single partition needs to be updated on write

Cons: A read of secondary index requires a scatter/gather across all partitions.

Question 7

Q

Partitioning and secondary indexes - 2. Term-partitioned indexes

Answer

A

Secondary indexes are partitioned separately, using the indexes values. An entry in the secondary index may include records from all partitions of the primary key.

Pros: A read can be served from a single partition.
Cons: When a document is written, several partitions of the secondary index need to be updated.

Question 8

Q

Rebalancing partitions

Answer

A

How not to do it: hash mod N
Fixed number of partitions
- The number of partitions does not change, nor does the assignment of keys to partitions. The only thing that changes is the assignment of partitions to nodes.
- Create many more partitions than there are nodes
Dynamic partitioning
Partitioning proportionally to nodes

Question 9

Q

Request routing

Answer

A

Allow clients to contact any node (round-robin load balancer)
Send request to a routing tier first (partition aware load balancer)
Require clients to be aware of partitioning

Key problem: How does the component making the routing decision and learn the assignments of partition change?

Zookeeper. Each node register itself in zookeeper.

6. Partitioning Flashcards

(9 cards)