Distributed Systems Flashcards
CAP Theorem
Can only have 2 of the following
- Consistency
- Availability
- Partition
What does consistency in the CAP theorem mean?
Every read receives the most recent write or an error
This is different from the consistncy in ACID. ACID consistency means every transaction brings the database from one valid state to another.
What does availability mean?
Every request receives a non-error response
What is partition tolerance?
The ability of a distributed database to continue processing data even if a network partition causes communication errors between subsystems.
5 Nines Uptime
About 5 minutes downtime per year
3 Nines Uptimes
< 9 hours of downtime per year
System Design
- Requirements clarification
- System interface definition
- Back of the envelope estimation
- Defining the data model
- High level design
- Component design
- Identifying and resolving bottlenecks
Redundancy
- Duplication of data or services with the goal of increased reliability
- Removes single point of failures and creates backups
- Can also create a shared nothinr architecture where each node can operate independently of the others. Makes it easier to add new servers and systems are more resilient to a single failure (e.g. database).
Replication
Process of synchronising state between nodes
Active replication - each message goes to each node
Passive replication - leader/follower relationship (eventual consistency). All writes go to the leader but reads are allowed from the followers. If the leader goes down, a follower gets promoted.
What’s wrong with key % n?
- Not horizontally scalable
- Every time you add a server, the data will need to be repartitioned - all keys will need to be remapped - which almost always involves downtime
- Not load balanced since it assumes uniformly distributed data/behavior
What is Consistent Hashing?
- Minimizes reoragnizatin when nodes are added or removed
- When hash table is resized (a node is added or removed from the system), only k/n keys need to be remapped where k is the total number of keys and n is the number of nodes.
- When a node is removed from the system, objects on that host are shared among other hosts.
- When a node is added, it takes objects from a few nodes without touching all nodes
How does Consistent Hashing Work?
As a typical hash function, consistent hashing maps a key to an integer. Suppose the output of the hash function is in the range of [0, 256). Imagine that the integers in the range are placed on a ring such that the values are wrapped around.
Here’s how consistent hashing works:
Given a list of cache servers, hash them to integers in the range.
To map a key to a server,
Hash it to a single integer.
Move clockwise on the ring until finding the first cache it encounters.
To add a new server, say D, keys that were originally residing at C will be split. Some of them will be shifted to D, while other keys will not be touched.
To remove a cache or, if a cache fails, say A, all keys that were originally mapped to A will fall into B, and only those keys need to be moved to B; other keys will not be affected.
For load balancing, as we discussed in the beginning, the real data is essentially randomly distributed and thus may not be uniform. It may make the keys on caches unbalanced.
To handle this issue, we add “virtual replicas” for caches. Instead of mapping each cache to a single point on the ring, we map it to multiple points on the ring, i.e. replicas. This way, each cache is associated with multiple portions of the ring.
If the hash function “mixes well,” as the number of replicas increases, the keys will be more balanced.