6. Partitioning Flashcards

1
Q

Partitioning

A

Normally, partitions are defined in such a way that each piece of data (each record, row, or document) belongs to exactly one partition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Purpose of partitioning

A

The main reason for partition data is scalability. Different partitions can be placed on different nodes in a shard-nothing cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Partitioning and replication

A

A node may store more than one partition.

Combining replication and partitioning: each node acts as leader for some partitions and follower for other partitions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Partitioning approaches - 1. Key range partitioning

A

Keys are sorted. A partition owns all the keys from some minimum up to some maximum.
Pros: Sorting enable efficient range queries
Cons: Risk of hot spots if application often accesses keys that are close together in the sorted order

In this approach, partitions are typically rebalanced dynamically by splitting the range into two subranges when a partition gets too big.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Partitioning approaches - 2. Hash partitioning

A

A hash function is applied to each key, and a partition owns a range of hashes.

Pros: Distribute load more evenly
Cons: Make range queries inefficient

When partitioning by hash, it is common to create a fixed number of partitions in advance, to assign several partitions to each node, and to move entire partitions from one node to another when nodes are added or removed. Dynamic partitioning can slo be used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Partitioning and secondary indexes - 1. Document-partitioned indexes

A

Secondary indexes are stored in the same partition as the primary key and value.

Pros: Only a single partition needs to be updated on write

Cons: A read of secondary index requires a scatter/gather across all partitions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Partitioning and secondary indexes - 2. Term-partitioned indexes

A

Secondary indexes are partitioned separately, using the indexes values. An entry in the secondary index may include records from all partitions of the primary key.

Pros: A read can be served from a single partition.
Cons: When a document is written, several partitions of the secondary index need to be updated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Rebalancing partitions

A
  1. How not to do it: hash mod N
  2. Fixed number of partitions
    - The number of partitions does not change, nor does the assignment of keys to partitions. The only thing that changes is the assignment of partitions to nodes.
    - Create many more partitions than there are nodes
  3. Dynamic partitioning
  4. Partitioning proportionally to nodes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Request routing

A
  1. Allow clients to contact any node (round-robin load balancer)
  2. Send request to a routing tier first (partition aware load balancer)
  3. Require clients to be aware of partitioning

Key problem: How does the component making the routing decision and learn the assignments of partition change?

  • Zookeeper. Each node register itself in zookeeper.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly