System Design Terms Flashcards
ACID
Atomicity: Atomicity ensures that a transaction is treated as a single unit of work. Either all of the operations within the transaction succeed and are committed, or if any operation fails, the entire transaction is rolled back and the database is left unchanged. This property helps maintain data integrity by preventing partial updates that could leave the database in an inconsistent state.
Consistency: Consistency ensures that the database remains in a valid state before and after the transaction. In other words, any transaction must preserve all integrity constraints, such as foreign key constraints, uniqueness constraints, etc. This property guarantees that the database remains consistent even in the event of system failures or concurrent transactions.
Isolation: Isolation ensures that the execution of transactions concurrently produces results that are equivalent to those obtained if the transactions were executed sequentially. This property prevents interference between transactions, thereby avoiding issues such as dirty reads, non-repeatable reads, and phantom reads.
Durability: Durability ensures that once a transaction is committed, its effects are permanently stored in the database and will not be lost, even in the event of system failures. This is typically achieved by writing transaction changes to non-volatile storage, such as disk, so that they can be recovered in case of a crash.
BASE
Basically Available: BASE relaxes the consistency guarantee provided by ACID in favor of availability and partition tolerance. In distributed systems or NoSQL databases, maintaining strict consistency (as in ACID) across all nodes can be challenging and may impact availability. BASE acknowledges that in some cases, it’s acceptable for data to be inconsistent temporarily or for different users to see different versions of data.
Soft state: Soft state means that the state of the system can change over time, even without input. In BASE systems, data might be eventually consistent rather than immediately consistent. This means that updates to the database may take some time to propagate across all nodes in a distributed system.
Eventually consistent: Eventually consistent systems guarantee that if no new updates are made to a given data item, eventually all accesses to that item will return the same value. This is in contrast to immediately consistent systems, which provide strong consistency guarantees at all times.
Examples of ACID DBs
Traditional Relational Database Management Systems (RDBMS) such as Oracle, MySQL, PostgreSQL, and SQL Server often adhere to ACID principles.
Examples of BASE DBs
NoSQL databases like Cassandra, MongoDB, and Couchbase often follow BASE principles.
CockroachDB
Pros:
ACID
Atomicity: CockroachDB ensures that transactions are atomic, meaning they are treated as a single unit of work. Either all operations within a transaction succeed and are committed, or if any operation fails, the entire transaction is rolled back, maintaining data integrity.
Consistency: It maintains consistency by enforcing integrity constraints and ensuring that transactions preserve the validity of the database schema. CockroachDB guarantees that transactions leave the database in a consistent state, adhering to the defined constraints.
Isolation: CockroachDB provides isolation between transactions to prevent interference and maintain data integrity. It ensures that transactions execute concurrently without impacting each other’s outcomes, avoiding issues like dirty reads and non-repeatable reads.
Durability: CockroachDB ensures durability by persistently storing transaction changes on disk. Once a transaction is committed, its effects are durable and will not be lost even in the event of system failures, ensuring data durability and reliability.
BASE (Partial)
Basically Available: CockroachDB emphasizes high availability by distributing data across multiple nodes in a cluster. It ensures that data remains available for reads and writes even in the presence of node failures or network partitions. However, it doesn’t compromise on consistency entirely; it still maintains strong consistency within partitions (ranges) of data.
Soft state and Eventually consistent: While CockroachDB aims for strong consistency within each partition (range) of data, it does sacrifice some level of immediate consistency across the entire cluster for the sake of availability and partition tolerance. It employs a mechanism called “Consensus Protocol” (like Raft) to ensure consistency within each range, but there may be a brief period where data may be eventually consistent across the entire distributed system.
Cons:
While CockroachDB offers many benefits such as scalability, fault tolerance, and strong consistency, there are also some potential drawbacks or considerations to be aware of:
Complexity: Setting up and managing a distributed database system like CockroachDB can be more complex compared to traditional single-node databases. It requires expertise in distributed systems, network configurations, and cluster management.
Performance Overhead: Due to its distributed nature and strong consistency guarantees, CockroachDB may introduce some performance overhead compared to single-node databases, especially for highly concurrent workloads or transactions that span multiple nodes.
Storage Overhead: Distributed databases often require redundant copies of data to ensure fault tolerance and data durability. This can result in higher storage requirements compared to non-distributed databases.
Learning Curve: Developers and administrators may need to invest time in learning CockroachDB’s architecture, SQL dialect, and operational best practices, especially if they are transitioning from traditional SQL databases.
Cost: While CockroachDB is available in an open-source edition, the enterprise features and support may come with a cost. Organizations should consider the total cost of ownership, including hardware, maintenance, and support, when evaluating CockroachDB for production use.
Consistency vs. Latency Trade-offs: CockroachDB provides strong consistency guarantees, but achieving strong consistency in a distributed system may lead to increased latency for certain operations, especially in scenarios where data needs to be replicated across multiple nodes.
Data Modeling Considerations: Distributed databases like CockroachDB may have different data modeling considerations compared to single-node databases. Developers need to carefully design schemas and queries to optimize performance and leverage the distributed architecture effectively.
A valid avg document file size
100 KB
1 trillion bytes
1 TB
1 billion bytes
1 GB
1 million bytes
1 MB
1000 bytes
1 KB
One thousand trillion bytes
1 PB
seconds in a month
2,628,288 seconds
~ 2.6 million seconds
minutes in a month
43829 minutes
~ 44000 minutes
Solution for polling (i.e. for distributed client syncing)
One solution would be for the client to poll the server periodically, however this will have a delay in reflecting changes locally since polling will be on an interval. This will also waste bandwidth, as the server needs to return empty responses most of the time, and will also keep the server busy. Pulling information in this manner is not scalable.
A better solution would be to use HTTP long polling. With long polling, the client requests information from the server with the expectation that the server may not respond immediately. If the server has no new data for the client when the poll is received, instead of sending an empty response, the server holds the request open and waits for response information to become available. Once it does have new information, the server immediately sends an HTTP/S response to the client, completing the open HTTP/S Request. Upon receipt of the server response, the client can immediately issue another server request for future updates.
How should clients handle slow servers?
Clients should exponentially back-off if the server is busy/not-responding. Meaning, if a server is too slow to respond, clients should delay their retries, and this delay should increase exponentially.