Systems Design Foundations Flashcards
What are the 6 steps for solving a Systems Design problem?
HelloInterview
Start with what is interesting about this problem when possible.
Requirements
NFRs
Entities
API
Step Through design
Deep Dives/NFRs
IK
1. Define Requirements
2. Define Services
3. Define Architecture
4. Define Microservice
____4a. Define Service Anatomy
____4b. Define Scaling
____4c. Define Distributed System
What are the 6 concepts that should be considered for scaling in a Systems Design problem?
- Storage: Size of data is so big that it cannot fit on a single machine.
- Throughput: If the number of requests per second is too huge, need to scale for throughput.
- Latency: If the response time is too high because of bulky processing, need to parallelize the computation.
- Availability: Availability / Reliability in the face of faults
- Geolocation: Minimize network latency by using multiple servers at different locations.
- Hotspot: Disproportionately high load on a section of the data.
What are the types of systems?
- Online
- Batch
- Stream
What data structures and concepts can you use to help solve algorithm problems?
- Decrease and Conquer
____* BST (sorted Tree)
____* Two Pointer - Divide and Conquer
____* Merge Sort - Transformation and Conquer
____* Balanced Tree
____* Map
____* Presorted list
How many requests per second is 1 billion requests a month?
385 clicks per second. Just round this to 400 to make the math easier.
How many requests per second is 1 billion requests a year?
30 clicks per second.
What are the powers of 2^n from 0 to 15?
2^1 = 2
2^2 = 4
2^3 = 8
2^4 = 16
2^5 = 32
2^6 = 64
2^7 = 128
2^8 = 256
2^9 = 512
2^10 = 1,024
2^11 = 2,048
2^12 = 4,096
2^13 = 8,192
2^14 = 16,384
2^15 = 32,768
2^16 = 65,536
2^17 = 131,072
2^18 = 262,144
2^19 = 524,288
2^20 = 1,048,576
What are the powers of 2^n for 10, 20, 30, 40, 50?
- 2^10 = 1024
- 2^20 is approx 1 million
____* So we need 20 bits to represent 1 million - 2^30 is approx 1 billion
____* So we need 30 bits to represent 1 billion - 2^40 is approx 1 trillion
____* So we need 40 bits to represent 1 trillion - 2^50 is approx 1 quadrillion
____* So we need 50 bits to represent 1 quadrillion
What are the stats of a standard commodity machine?
Storage RAM
* 128 GB RAM
Storage Disk Space
* 1 to 2 TB of disk space
* Design for 3 years of data
Throughput
* 8 to 12 cores
____* 8 threads per core
____* 30% utilization
____* throughput = 30,000 / x where x is the time to finish the average request.
____* Num Servers = (req / sec * avg req time) / 30,000
Latency
* gigabit connection is 1 billion bits per second or ~ 125 million bytes per second (1 billion / 8)
When comparing throughput vs storage, when can you argue that you don’t need a cache?
If the storage layer needs more servers to handle the storage requirements than the app or storage layer needs for throughput. i.e. If you have enough servers to handle throughput, you might not need a cache. This isn’t the only reason why you would need a cache though.
If throughput is governing scalability, you can potentially use the cache to reduce the cost of the storage layer and the overall system.
O(?) of what constant is approximately 100 ms.
100 ms: O(10,000,000)
1 ms: O(100,000)
What are the popular distributed file storage tools?
HDFS, AWS S3, DropBox
Which databases allow for high writes?
Cassandra, BigTable, Dynamo were all created based on the BigTable white paper from Google.
What architecture do high-write databases (e.g., Cassandra, BigTable, Dynamo) employ?
Log-Structured Merge (LSM) trees.
* Key idea is that it is an append only datastore which means reads are more complicated because it needs to merge across multiple SSTables for read.
* First writes to a Write Ahead Log (WAL) to ensure durability of data in case of crash.
* Then rights to mutable memtable (sorted tree)
* When tree is full it is flushed into an Sorted String Table (SSTable) which is immutable and stored in sequence on disk.
What is the difference between Hash-Based partitioning and Range Based Partitioning?
Hash-Based Partitioning
Definition: Each key and all its associated data is fully stored on one instance.
Use Case: This method is commonly used in distributed databases like Cassandra or MongoDB.
Pros: Data locality, Consistency, Scalability
Cons: Hotspots, Consistent Hashing is needed for rebalancing.
Range-Based Partitioning
Definition: The data for a single key is split across multiple instances, where each instance holds part of the data for that key. Done based on a value in the data such as a time range or another column.
Use Case: This approach is used in distributed systems that need to handle very large objects or where partitioning based on certain ranges of data is essential.
Pros: Can handle very large data sets and parallelize access to a single key.
Cons: Expensive loss of throughput, complicated merging of a single doucment.
What is the rule of thumb on cache size based on Zipf’s Law?
For a 90% cache hit rate, you need 10% of the most frequently used data in the cache.
Define SLI, SLO, and SLA
Service Level Indicator (SLI): is a specific metric that helps companies measure some aspect of the level of services to their customers. e.g. Request Latency, Error Rate, System Throughput, Availability, Yield, Durability
Service Level Objective (SLO): Target value/range for an SLI
Service Level Agreement (SLA): Explicit contract with users on what the SLOs are and what the consequences are for missing.
Different Caching Strategies?
Pass-Through/Write-Through: App code sends to cache, the cache owns writing to disk.
Cache-Aside: App code handles writing to data store/disk and to cache.
Write-back: App code writes to cache, cache immediately responds back to app code and asych writes to data store/disk.