Systems Design Flashcards
What are the five key characteristics of distributed systems?
- Scalability
- Reliability
- Availability
- Efficiency
- Serviceability or Maintainability
What is scalability?
The ability of a system to elastically handle more demand without loss of data or performance.
What is horizontal vs vertical scaling?
Horizontal scaling: Adding more nodes to parallelize operations.
Vertical scaling: Adding more resources to existing nodes.
What database systems are best for scaling vertically?
Most SQL solutions like MySQL, PostgreSQL, etc., because SQL transactions do not lend themselves well to distribution.
What database systems are best for scaling horizontally?
Most NoSQL solutions like Cassandra and MongoDB because they lend themselves well to sharding and tend to avoid the integrity constraints of SQL.
What is reliability?
The probability that a system will fail during a given time period.
When is a system considered reliable?
A system is reliable if it keeps delivering its services even when one or more of its components fails.
What is availability?
The time a system remains operational to perform its required functions, expressed as a percentage.
How do availability and reliability relate as metrics?
Reliability is availability over time with consideration to all possible real-world conditions.
Is a reliable system available? What about the converse?
If a system is reliable, it is available, but the converse is not necessarily true. A system can be available without being reliable by minimizing downtime, having spare parts and nodes available, etc.
What is efficiency?
A measure of system performance using one of two metrics: latency, or throughput.
What is serviceability or maintainability?
The speed with which a system can be repaired or maintained. Failing systems lead to lower availability.
What are six of the most common load balancing algorithms?
- Least Connection Method
- Least Response Time Method
- Least Bandwidth Method
- Round Robin Method
- Weighted Round Robin Method
- IP Hash
How do you handle load balancer failures?
Add a redundant load balancer that takes over when the other one fails. Use multiple A records so that browsers auto-resolve to the first working LB.
What are three schemes for cache coherence and cache invalidation?
- Write-through caching
- Write-around caching
- Write-back caching
What is write-through caching?
Writes go to the database and cache at the same time.
What are the advantages and drawbacks of write-through caching?
Cache coherence is guaranteed, and no data is lost in any faults, but write latency is higher because of the need for two writes.
What is write-around caching?
Writes go first to the database, then the cache is invalidated and the next request leads to a fetch.
What are the advantages and drawbacks of write-around caching?
Reduces latency and the risk of the cache being flooded with writes, but leads to more cache misses which slows down reads.
What is write-back caching?
Writes go first to the cache, are immediately confirmed with the client, then later (perhaps in batches) to the database.
What are the advantages and drawbacks of write-back caching?
Very fast (perceived) reads and writes for high throughput, but leads to possible cache incoherence, especially if the database ultimately rejects the write.
What are the six most common cache eviction policies?
- First In First Out (FIFO)
- Last In First Out (LIFO)
- Least Recently Used (LRU)
- Most Recently Used (MRU)
- Least Frequently Used (LFU)
- Random Replacement (RR)
What are the three most popular data partitioning schemes?
- Horizontal partitioning
- Vertical partitioning
- Directory-based partitioning
What is directory-based partitioning?
Combining horizontal and vertical partitioning by using a lookup server that can be dynamically changed to point a DB key/tuple to a new database instance.
What are the four most common data partitioning criteria?
- Key or hash-based partitioning
- List partitioning
- Round-robin partitioning
- Composite partitioning
What are the three most common problems with data partioning?
- Joins and denormalization
- Referential integrity
- Rebalancing
What are four use cases for proxies in handling requests?
- Filtering requests
- Logging requests
- Transforming requests (adding/removing headers, encrypting/decrypting, compressing)
- Serving the request from its cache (ideally shared across users)
What is a reverse proxy?
Maps a request from one client to one/many servers that serve that request. The resources appear to the client as if they originated from the proxy itself.
What are the four most common types of NoSQL databases?
- Key-value stores
- Document databases
- Wide-column databases
- Graph databases
What is a document database?
A NoSQL database where data is stored in documents which are grouped together in collections (like tables). Unlike rows in SQL, documents can have entirely different schemas.
What are two commonly used document databases?
MongoDB, CouchDB
What is a wide-column database?
A NoSQL database where data is stored in dynamically generated columns and where each row can have any set of columns. Like a spreadsheet.
What are wide-column and columnar databases best for?
Analyzing large datasets, since each column can be analyzed independently and they fit better into pages, caches, etc.