General Flashcards
CAP Theorem - overview
Consistency
Availability
Partition tolerance
Must have partition tolerance because networks fail and distributed systems rely on networks, so have to choose between C and A.
Not so cut and dry - continuous scale of each, different tradeoff can be made at different subsystems.
X = node () = available ^ = data state "a" * = data state "b" CA: (X)^ --- (X)^ AP: (X)^ --- X* CP: (X)^ --- X* ===> X^ --- X^
Consistency
All servers in the system have the same data, so any request will receive the same data, regardless of which server answers their request.
Availability
System will always respond to a request, even if a node goes down.
Valid request can be outdated data.
Partition Tolerance
System can detect partitions, enter an explicit partition mode that can limit some operations, and initiate a recovery process to restore consistency and compensate for mistakes made during a partition.
CAP Theorem - CA combo
Would rarely have this occur because distributed systems require P.
Data is consistent between nodes, but if a partition between nodes occurs, the data will be out of sync and won’t resync once the partition is resolved.
CAP Theorem - AP
When network partition, let nodes operate freely.
CAP Theorem - CP
When network partition, shut down system or disallow reads and writes.
ACID properties
Traditional approach of databases.
Atomic - transaction either succeeds or fails.
Consistent - transaction preserves all database rules such as unique primary keys, field types, and field lengths. Unlike CAP, refers to consistency within a single node.
Isolation - transaction cannot read data from another incomplete transaction, even if the transactions are executing in parallel.
- If a transaction needs to read data written by another transaction, it will have to wait until the other is finished.
Durability - once a transaction is complete, it is guaranteed that all of the changes have been recorded to a durable medium (such as a hard disk), and the fact that the transaction has been completed is likewise recorded.
How is MongoDB not ACID-compliant?
At the document level, it is!
However, multi-collection (table) transactions are not atomic.
Partitioning
Splitting data across multiple instances.
Strategies
- range - map ranges to instances (0-10000 –> n1, …). Problem is that you need a table storing the mapping. That table needs to be managed, and a table is needed for each object type.
- hash - hash object and mod it.
- consistent hash - same as hash, but when machines are added/removed, only a small amount of data needs to be moved.
Implementation
- client-side
- proxy-assisted
- query routing - send query to random instance, and the instance forwards the request to the right node
Consistent hashing
Minimizes the issue with hashing which is that when the number of servers changes significantly, data needs to be redistributed between all of the servers (buckets).
Instead, the number of partitions of the data is determined up front (the maximum result set of a hash function). Then, a new node claims certain partitions and asks the current owners to hand off the data to them.
Types of databases
Relational - SQL
Document - MongoDB
Key-Value - Redis
Graph - Neo4J
Search (storing and searching text) - Elasticsearch