Distributed Databases Flashcards
What are the three key topics of DD?
- Replication: Keep a copy of the same data on several different nodes
- Partitioning: Split the database into smaller subsets and distribute the partitions to different nodes
- Transactions: Units of work that groups several reads and writes to be performed together in the database (NOT EXAM MAT!!)
Why is Replication important?
- Data scalability: increase read throughput by allowing several machines to serve read only requests
- Geo-scalability: to have the data close to clients
- Fault tolerance: to allow the system to work even if parts are down
What are the roles of Replication nodes?
- Leader: nodes that accept write queries from clients
- Follower: nodes tha tprovide read-only access to data
What are the usual paradigms for implementing replication?
- Single-leader: a single leader accepts writes, which are distributed to followers
- Multi-leader: multiple leaders accept writes, keep themselves in syunc, and update follwers
- Leaderless: all nodes are peers in the replication network
What is the main idea of Write Ahead Logs (WAL)?
WAL replication writes all changes to the leader and follower. Then, the followers apply the WAL entries to get consistent data.
Who uses WAL?
PostgreSQL and Oracle
What is logical based replication?
It generates a stream of logical updates for each update to the WAL
What are some examples of Logical updates?
- For new records, the insertion value
- For deleted records, the ID
- For updates records, the ID and updated value
Used by MongoDB and MySQL
What is the main problem with WAL?
It is bound to the implementation of the data strcutre. If it changes in the leader, it stops working
How does replication work when using Logical based replication?
- Take a snapshot from leader
- Ship it to replica
- Get an ID to the state of the leader’s replication log at the time the snapshot was created
- Initialize the replication function to the latest leader ID
- Retrieve and apply the replication log until it catches up
What is the main point of Synchronous replication?
The writes need to be confirmed by a configurable number of followers before the leader reports success.
What is the main point of Asynchronous replication?
The leader reports success as soon as the write was confirmed to disk, followers apply their own changes.
Name some characteristics of Synchronous replication.
- A follower is guaranteed to have up to date information with the leader
- The data is available even if the leader fails, on the followers
- If not enough followers respond, the operation cannot be processed
- All writes are blocked by the leader until enough follower writes are confirmed
- Impractical in real life
Name some characteristics of Asynchronous replication
- It has higher availability since writes are not blocked as much
- A follower is never guaranteed to have up to date copy
- Writes are not guaranteed to be durable in case of leader failure
Describe Synchronous replication as it relates to consistency and availability.
SR is very consistent, but not so available since it blocks writes until the current one is reported a success.