System Design - General Flashcards
Latency versus Throughput
Latency is the time needed to perform some action.
Throughput is the number of actions per unit time.
Generally, you want to maximize throughput for acceptable latency.
Performance Requirements
- Scale
- Latency
- Consistency
- Uptime
- Reliability (data loss scenarios)
- Fail open / Fail closed
Approach to System Design Interviews
- Clarify requirements
- Functional Requirements
- Performance Requirements
- APIs
- Back of the envelope estimations
- Data Model
- High level design
- Individual component design
- Monitoring & Observability & Alerting
CPU vs Core
The core is the processor part of the CPU. In the past, each CPU had a single processor (a single core). Now a single CPU can have multiple processors.
CAP Theorem
Consistency, Availability, Partition Tolerance
You can only have two
Strong Consistency
Every read receives the most recent write or an error
Data is the same across the cluster so you can read and write to/from any node and get the same data
Availability
Ability to access the cluster even if a node in the cluster goes down
Every request receives a response but the response may not reflect the most recent data
Partition Tolerance
A partition means two nodes in the cluster are unable to communicate with one another
Partition tolerance means the system continues to operate even if there is a communication failure (a partition) between nodes due to network failures
Since networks aren’t reliable, you need to support partition tolerance. That means the tradeoff becomes between consistency and availability.
Consistency and partition tolerance
CP system
Waiting for a response from a partitioned node might result in a timeout or an error. Good choice if your business requires atomic reads and writes.
Consistency and Availability
CA system
Not possible unless you’re ok with losing data once the network failure (the partition) is resolved
Since network failures are inevitable, the real tradeoff is between consistency and availability
Availability and partition tolerance
The response returns the most readily available version of the data available on any node (which might not reflect the most recent write). Good choice is you need eventual consistency or when the system needs to continue working despite errors.
Weak consistency
After a write, reads may or may not see it. A best effort approach is taken.
Examples: VoIP, video chat, realtime multiplayer games. If you lose reception for a few minutes, when you reconnect you don’t hear what you missed.
Eventual consistency
After a write, reads will eventually see it. Data is replicated async. Works well in highly available systems.
Examples: DNS, Email
Strong consistency
After a write, reads with see it. Data is replicated synchronously.
Examples: transaction based systems
High Availability Patterns
Fail-over and replication
These are complimentary, not mutually exclusive