System Design - Caching Flashcards
Approach to System Design Interviews
- Requirements clarification - System interface definiton - Back of the envelope estimation - Defining the data model - High level design - Component design - Identifying and resolving bottlenecks
Application Server Cache - Pros and Cons
Pros - Placing the cache directly on a request layer node enables the local storage of response data. The cache can be located in memory (fastest) or on the node’s local disk which is faster than going to local storage. Cons -If the request layer is expanded to multiple nodes, it’s still possible for each node to have its own local cache, but if the load balancer randomly distributes requests, the same request will go to different nodes thus increasing cache misses.
How does a distributed cache work?
Each node owns part of the cached data. Typically the cache is divided up using a consistent hashing function. That way, if a node is looking for a particular piece of data, it knows where in the cache to look. Each node has a small piece of the hash and sends a request to other nodes before going to the origin.
Distributed Cache - Pros and Cons
Pros - Easy to increase cache space by adding additional nodes Cons - If a node disappears, you lose that part of the cache. Can use replication and store multiple copies of the cache on different nodes but that adds complexity. - Even if a node does disappear, the request can still retrieve the data from the origin.
Global Cache - Pros and Cons
All the nodes use a single cache space. Pros Effective for architectures that uses specialized hardware or have a fixed data set that needs to be cached. Cons Easy to overwhelm the cache as the number of clients and requests increase.
Global Cache - Common Architectures
Requests talk to the cache. The cache talks to the database. This is the most common architecture.
Rquests talk to the cache first then if the data is not in the cache, the requests query the database directly. This architecture makes more sense if the cache is being used to store large files. In that situation, the cache would become a bottleneck in the event of a low cache hit percentage. Also makes sense in the data in the cache is static and shouldn’t be evicted.
Strategies for Cache Invalidation
Write-through
Data is written to the cache and the corresponding database at the same time. Minimizes the risk of data loss but adds latency because every write operation is done twice before returning success to the client.
Write-around cache
Data is written directly to the permanent store and bypasses the cache. Stale data is evicted from the cache. Advantage is it reduces writes to the cache that may never be read. Disadvantage is that the next read request will result in a cache miss.
Write-back cache
Data is written to the cache alone and completion is immediately confirmed to the client. Write to permanent storage happens at specified interval or under certain conditions. Advantages are low latency and high throughput for write-intensive applications. Risk of data loss in the event of a crash.
Cache Eviction Policies
- FIFO
- LIFO
- LRU (Least Recently Used)
- MRU (Most Recently Used)
- LFU Least Frequently Used
- Random Replacement