CAP THEOREM Flashcards
What is CAP Theorem?
CAP theorem states that a distributed system cannot provide all three of the following desirable properties: consistency, availability, partition tolerance.
Any distributed system needs to pick two out of the three properties. The three options are CA, CP, and AP. However, CA is not really a coherent option, as a system that is not partition-tolerant will be forced to give up either Consistency or Availability in the case of a network partition. Therefore, the theorem can really be stated as: In the presence of a network partition, a distributed system must choose either Consistency or Availability.
What is consistency?
All nodes see the same data at the same time. This means users can read or write from/to any node in the system and will receive the same data. It is equivalent to having a single up-to-date copy of the data.
What is availability?
Availability means every request received by a non-failing node in the system must result in a response.
What is partition tolerance?
A partition is a communication break (or a network failure) between any two nodes in the system, i.e., both nodes are up but cannot communicate with each other
A partition-tolerant system continues to operate even if there are partitions in the system.
Which aspect of CAP did ACID databases choose?
Consistency. Software like mySQL, Oracle, and more refuse to respond until it can check with its peers.
In CAP, what did BASE databases choose?
BASE databases (Basically available, soft-state, eventually consistent), like Cassandra and Mongo, chose availability.
What is PACELC theorem?
P = Partition: What happens when there’s a network partition?
A = Availability
C = Consistency
E = Else (when there is no partition)
L = Latency
C = Consistency
In other words, in the presence of a network partition, a system can choose either availabilty or consistency. In the absence of a partition (all nodes can communicate with one another), a system can choose latency or consistency.
Give an example of PACELC in action.
Dynamo and Cassandra are PA/EL systems: They choose availability over consistency when a partition occurs; otherwise, they choose lower latency.
BigTable and HBase are PC/EC systems: They will always choose consistency, giving up availability and lower latency.
Give an example with MognoDB and PACELC.
MongoDB can be considered PA/EC
MongoDB works in a primary/secondaries configuration. In the default configuration, all writes and reads are performed on the primary. As all replication is done asynchronously (from primary to secondaries), when there is a network partition in which primary is lost or becomes isolated on the minority side, there is a chance of losing data that is unreplicated to secondaries, hence there is a loss of consistency during partitions. Therefore it can be concluded that in the case of a network partition, MongoDB chooses availability, but otherwise guarantees consistency. Alternately, when MongoDB is configured to write on majority replicas and read from the primary, it could be categorized as PC/EC.
What is the difference between a network partition and a database partition?
A network partition is a failure scenario in a distributed system where different parts of the network can’t communicate with each other due to a disruption.
Database partitioning is a design strategy to split a large database into smaller, more manageable chunks called partitions or shards.