4A_Replication & Partitioning Flashcards
What are the reasons for replication?
Performance:
- Scale in size; more capacity
- Scale geographically; Closer to client = lower latency
Redundancy:
- Have other copies when a component fails
Redundancy vs Availability?
Availability also means ability to progress, not only read data but also able to do writes
Definition of CAP?
Consistency, Availability and Partition Resistance
What are the different consistency models?
Linearisability
Sequential consistency
Causal Consistency
Eventual Consistency
What is linearisability?
Clients see atomic writes in same order
What is sequential consistency?
Clients see writes in same order
What is causal consistency?
Clients see causally related writes in same order, unrelated writes possibly in different order.
What is eventual consistency?
Eventually all replicas converge.
Clients may see differences meanwhile.
Arguments linearisability
Global lock on replicas
Not scalable
Arguments Sequential Consistency
Need total order on writes
Hard, but some capacity possible. Use MutEx solutions
Arguments causal consistency
Causal relations via vector clocks.
Hard, need per-process info
Arguments against consistency in general?
Does not scale
Slow and expensive
Replication Protocols?
Primary / Backup
Active Replication
Quorum Based
What is primary / backup replication protocol?
Primary is implicit sequencer. Variations: - Send write operations to backups - Send result of write operation to backups - Send invalidation to backups
What is active replication protocol?
All replicas perform the write operation
Need explicit sequencer / total ordering
What is caching?
Able to replicate temporarily
What are scaling techniques?
Bigger Machines Virtualisation Asynchronous communication Replication and caching Partitioning
What is partitioning?
Split work/responsibilities over a set of servers
Common method is hierarchical
Second common method partitioning?
Formulaic. Data is divided into parts according to some formula. Parts are distributed to servers via another formula using the server ID and a meta server remembers which parts are where
What falls under service partitioning?
Split different things:
Obvious: Web, DB, DNS
Others: signup, login, search, user accounts, inventory
What falls under data partitioning?
Splitting similar things:
Split by customer ID, last name, location. This is often hierarchically and splits are often equal in size (hashing)
What is Hadoop?
Partition Data over many disks
Process data in parallel
What is MapReduce data flow?
First you split your data. The map is some sort of formula function which is ran in parallel. Then the results are merged and a reduce function is applied after which the results are returned
Why is consistent hashing used?
A method to divide data over servers