Dynamo: Amazon’s Highly Available Key-value Store Flashcards
Problem: Partitioning
Technique: ___
Consistent hashing
Technique: Consistent hashing
Advantage: ___
Incremental scalability
Problem: High availability for writes
Technique: ___
Vector clocks with reconciliation during reads
Technique: Vector clocks with reconciliation during reads
Advantage: ___
Version size is decoupled from update rate
Problem: Handling temporary failures
Technique: ___
Sloppy Quorum and hinted handoff
Technique: Sloppy Quorum and hinted handoff
Advantage: ___
Provides high availability and durability guarantees when some replicas are unavailable
Problem: Recovering from permanent failures
Technique: ___
Anti-entropy using Merkle trees
Technique: Anti-entropy using Merkle trees
Advantage: ___
Synchronizes divergent replicas in the background
Problem: Membership failure and detection
Technique: ___
Gossip-based membership protocol and failure detection
Technique: Gossip-based membership protocol and failure detection
Advantage: ___
Preserves symmetry and avoids using a central registry for node membership and availability information
Incremental scalability
scale out one node at a time with minimal impact on system & operators
Symmetry
each node has the same responsibilities as its peers
Decentralization
an extension of symmetry, favouring peer-to-peer techniques over centralized control
Design consideration: heterogeneity
work distribution must be proportional to the capabilities of individual servers
Random position assignment in consistent hashing leads to ___
non-uniform data and load distribution
Consistent hashing ignores ___
heterogeneity in the performance of individual nodes
Dynamo addresses problems with consistent hashing by ___
assigning each physical node to multiple points in the ring using virtual nodes
When a (virtual) node becomes unavailable:
the load handled by this node is evenly dispersed across the remaining available nodes
When a new / newly available (virtual) node joins:
the new node accepts a roughly equivalent amount of load from each of the other available nodes
Using virtual nodes takes advantage of heterogeneity by:
assigning a proportional amount of virtual nodes to the physical node based on its capacity
Dynamo’s view of data versions
Dynamo treats each modification as a new and immutable version of the data
syntactic reconciliation
system determines the authoritative version of the data (i.e. timestamps)
semantic reconciliation
client collapses multiple branched versions of the data into one version
vector clock
list of (node, counter) paris associated with every version of every object
hinted handoff
data intended for node A is instead stored on node B, with a “hint” that its intended recipient is A, and can be stored on A once it recovers
sloppy quorum
all read and write operations are performed on the first N healthy nodes from the preference list