System Design Flashcards
7 steps of systems design problems
- Requirements clarification
- Back-of-the-envelope estimation
- System interface definition
- Defining data model
- High-level design
- Detailed design
- Identifying and resolving bottlenecks
Step 1: Requirements Clarification
- Determine exact scope
- Define end goals of system
- Clarify which parts of system to focus on (e.g. back-end vs. front-end)
Step 2: Back-of-the-envelope estimation
- Estimate and quantify scale of system
Step 3: System interface definition
Define what APIs are expected from the system
Step 4: Define data model
- How will data flow between components?
- How will different entities interact with each other?
- How will we partition and manage data? Specific choices:
- Which database system should we use? NoSQL vs SQL?
- What kind of block storage should we use to store files? (e.g. multimedia)
Step 5: High-level design
- Draw a block diagram representing core components to solve the actual problem from end-to-end.
- Possibly describe the system verbally or type out in some kind of list format
Step 6: Detailed design
- Dig deeper in to 2-3 major components, guided by interviewer feedback.
- Consider tradeoffs between different approaches.
Step 7: Identifying and resolving bottlenecks
- Identify any single points of failure and discuss mitigation
- Discuss redundancy and backup plans for data and services
- Discuss performance monitoring
key characteristics of distributed systems
- Scalability
- Reliability
- Availability
- Efficiency
- Serviceability or Manageability
Scalability
capability of a system, process, or network to grow and manage increased demand
Horizontal vs vertical scaling
- Horizontal is easier to scale dynamically by adding machines
- Vertical scaling is upper limited and may involve downtime
- Horizontal scaling examples: Cassandra, MongoDB
- Vertical scaling examples: MySQL
Reliability
- Probably a system will fail in a given period
- Distributed system: keeps delivering services with one or several component failures
- Availability over time
Availability
- Time a system remains operational over a specific period
* Accounts for maintainability and repair
Efficiency
- Latency / response time to requests (correlates to number of messages)
- throughput / bandwidth (correlates to size of messages)
Serviceability / manageability
- Ease to operate and maintain
- Simplicity and speed of repair (as it increases, availability/reliability decrease)
- Considerations: ease of diagnostics, ease of updates
- Automated fault detection
Load balancer
- component to spread traffic across a cluster of servers
- improves responsiveness and availability
- crucial to horizontal scaling
- Ensure health of chosen server
- Select a healthy server
Load balancing placements
- Between client and web server
- between web servers and internal layer (app servers)
- between internal layer and database
Load balancer: least connection method
- directs traffic to server with fewest connections
* good for large number of persistent connections
Load balancer: least response time method
- directs traffic to the server with the lowest response time
Load balancer: least bandwidth method
- selects the server that is currently serving the least amount of traffic
Load balancer: round robin method
- cycles through available servers and sends each request to the next server
- good for servers of equal specs and few persistent requests
Load balancer: weighted round robin method
- round robin but with weights on different servers based upon processing capacity
Load balancer: IP hash method
- client IP address is hashed and servers are each assigned blocks of hashes
Load balancer redundancy
- can be a single point of failure
- can add more LBs to form a cluster of active/passive instances
- clustered LBs monitor each other and passive takes over if active fails
Caching
- locality of reference principle: recently requested data is likely to be requested again
- often implemented near front end to reduce downstream traffic
Application server cache
- cache placed directly on request layer, check if each request is in the cache before fetching from disk
- can have multiple layers, e.g. node memory -> node disk (still faster than network)
Content Delivery Network (CDN)
- cache for sites serving large amounts of static media
Cache invalidation
- maintenance to keep cache consistent with database when data changes
Write-through cache
- Data is written into the cache and DB simultaneously
- Maximizes consistency, minimizes risk of loss
- Higher latency as a result of double write operations
Write-around cache
- Data is written directly to permanent storage, bypassing the cache
- avoids flooding the cache with writes
- increases chance of cache misses, which must then be read from back end with higher latency
Write-back cache
- Data is written to cache alone
- Write to permanent storage is done in intervals or chunks
- Lowest latency, highest throughput for write-intensive apps
- Risk of data loss due to crash since there is no cache backup
Cache eviction policies (6)
- FIFO - evicts oldest block first
- LIFO - evicts newest block first
- LRU - evicts least recently used items first
- MRU - evicts most recently used items first
- LFU - evicts least frequently used items first
- RR - random replacement
Importance of estimation
- important later for scaling, partitioning, load balancing, and caching
Examples of system parameters to estimate
- Examples of things to quantify: number of actions, amount of storage, expected network bandwidth usage
Components to include in high level design
*Clients, load balancing, application servers, databases, file storage
Examples of detailed design topics
- How will we partition data types between multiple databases?
- How should we optimize data storage further (e.g. recency)?
- How much and at what layer should we implement caching?
- Which components need load balancing?
Horizontal scaling
Horizontal add more servers in to your resource pool
Vertical scaling
Adding more power/capacity to an existing server
Cache misses
when request data is not found in the cache
Cache invalidation: 3 main schemas
- write-through
- write-around
- write-back