System Design - General Flashcards
Latency versus Throughput
Latency is the time needed to perform some action.
Throughput is the number of actions per unit time.
Generally, you want to maximize throughput for acceptable latency.
Performance Requirements
- Scale
- Latency
- Consistency
- Uptime
- Reliability (data loss scenarios)
- Fail open / Fail closed
Approach to System Design Interviews
- Clarify requirements
- Functional Requirements
- Performance Requirements
- APIs
- Back of the envelope estimations
- Data Model
- High level design
- Individual component design
- Monitoring & Observability & Alerting
CPU vs Core
The core is the processor part of the CPU. In the past, each CPU had a single processor (a single core). Now a single CPU can have multiple processors.
CAP Theorem
Consistency, Availability, Partition Tolerance
You can only have two
Strong Consistency
Every read receives the most recent write or an error
Data is the same across the cluster so you can read and write to/from any node and get the same data
Availability
Ability to access the cluster even if a node in the cluster goes down
Every request receives a response but the response may not reflect the most recent data
Partition Tolerance
A partition means two nodes in the cluster are unable to communicate with one another
Partition tolerance means the system continues to operate even if there is a communication failure (a partition) between nodes due to network failures
Since networks aren’t reliable, you need to support partition tolerance. That means the tradeoff becomes between consistency and availability.
Consistency and partition tolerance
CP system
Waiting for a response from a partitioned node might result in a timeout or an error. Good choice if your business requires atomic reads and writes.
Consistency and Availability
CA system
Not possible unless you’re ok with losing data once the network failure (the partition) is resolved
Since network failures are inevitable, the real tradeoff is between consistency and availability
Availability and partition tolerance
The response returns the most readily available version of the data available on any node (which might not reflect the most recent write). Good choice is you need eventual consistency or when the system needs to continue working despite errors.
Weak consistency
After a write, reads may or may not see it. A best effort approach is taken.
Examples: VoIP, video chat, realtime multiplayer games. If you lose reception for a few minutes, when you reconnect you don’t hear what you missed.
Eventual consistency
After a write, reads will eventually see it. Data is replicated async. Works well in highly available systems.
Examples: DNS, Email
Strong consistency
After a write, reads with see it. Data is replicated synchronously.
Examples: transaction based systems
High Availability Patterns
Fail-over and replication
These are complimentary, not mutually exclusive
Active-passive failover
Also referred to as master slave
If the active server stops sending heartbeats, the passive server takes over.
Only the active server handles traffic
Active-active failover
Both servers are managing traffic with the load spread between them.
Also called master-master failover
99.9% uptime (three nines)
Just under 9 hours downtime a year (8h 45 min)
99.999% uptime (five nines)
5 minutes per year
99.99% uptime (four nines)
Just under 1 hour a year (52 min)
Spanner
GCP high availability global SQL database
Not truly CAP but truly high availability (5 9s). One failure in 10^5 reads or writes.
In the event of partitions, Spanner becomes CP.
Vertical partitioning
- Some columns are moved to new tables. Each table contains the same number of rows but fewer column
- Can be used to isolate rarely used columns
Horizontal partitioning
- also known as sharding
- it divides a table into multiple smaller tables. Each table is a separate data store, and it contains the same number of columns, but fewer rows
Horizontal Sharding Pros & Cons
Pros
- Facilitate horizontal scaling. Sharding facilitates the possibility of adding more machines to spread out the load.
- Reduces response time. By sharding one table into multiple tables, queries go over fewer rows, and results are returned much more quickly.
Cons
- The order by operation is more complicated. Usually, we need to fetch data from different shards and sort the data in the application’s code.
- Uneven distribution. Hot spots. Some shards may contain more data than others (this is also called the hotspot).