System Design Week 2 - Distributed Cache Flashcards

Question 1

Q

Problem 1) Distributed Cache

Question 1) List down the functional requirements

Answer

A

put(key, value)
get(key)
Support LRU Eviction policy (or make it configurable)
Support persistence.
Ignore security

Question 2

Q

Problem 1) Distributed Cache

Question 2) List down the non functional requirements

Answer

A

Scalable
Available
High Performance

Question 3

Q

Problem 1) Distributed Cache

Question 3) List down the microservices covering all functional requirements

Answer

A

Cache client - Not really a service, but a library
Cache Server - A collection of server components that manage the cache.

Question 4

Q

Problem 1) Distributed Cache

Question 4) Logical Diagram & flow of data

Answer

A

Diagram: https://drive.google.com/file/d/1H_oF4bNQDzG57k0cl2MPgun4gLNbv_U-/view?usp=sharing

Question 5

Q

Problem 1) Distributed Cache

Question 5) Schema Design

Answer

A

Concrete schema is not required here.

Data persistence :

Main memory is volatile, so we need to store the data on the
secondary memory.

● Two common techniques
- Append only file/logs
■ Data is written to disk on every write
■ Data is written to disk at a fixed time interval (1 sec)
- Snapshot
■ Snapshot of entire data is created at configurable time

Question 6

Q

Problem 1) Distributed Cache

Question 6) API Design

Answer

A

put(key, value)
get(key)

Question 7

Q

Problem 1) Distributed Cache

Question 7) Business logic for the problem statement

Answer

A

What is LRU ?
→ A combination of Doubly Linked List and a Hash Table, or a LinkedHashMap
● DLL nodes will have the key
● On access
- Move the key to the head of the DLL
● To evict
- Remove the key from the tail
- Delete from the Hash Table
● Update existing value
- Update the Hash Table
- Move the key to the head of the DLL
● Adding a new key
- Add to Hash Table
- Add the key to the head of the DLL

Question 8

Q

Problem 1) Distributed Cache

Question 8) Design Consideration for the problem statement

Answer

A

https://drive.google.com/file/d/1fN2-Lui7LEgOg89PfgGhHYbNsS1hpwQs/view?usp=sharing

and

https://drive.google.com/file/d/1BJTqYpEEweBbf4gl_2KuE3LBHalZhlHd/view?usp=sharing

A deterministic set of reasons can be posed across all interviews
○ Need to scale for storage (storage and cache tiers) - Yes
○ Need to scale for throughput (CPU/IO) - Yes
○ Need to scale for API parallelization - No
○ Need to remove hotspots - Maybe
○ Availability and Geo-distribution - Yes

Question 9

Q

Problem 2) How to Scale

Question 1) Replication

Answer

A

Copy the data on multiple read replicas
All writes could be handled by primary, reads could be handled by any replicas
Strong consistency or Weak/Eventual consistency?

■ Once write is done by primary, when do you send ack back to the user would
change based on this.

■ Additional complexity of 2 or 3 phase commit or paxos for strong
consistency.

Question 10

Q

Problem 2) How to Scale

Question 2) Sharding

Answer

A

Distribute data across multiple cache servers

Consistent hashing or equivalent techniques to distribute keys to hosts.
How do the clients know which server to get the data from?
■ Central router
● Client connects to central router
● Router knows which node has which keys, routes the request
■ Decentralized
● Client knows all servers (but not the mapping of keys)
● Client picks a server randomly or round robin etc
● Every server knows which keys maps to which node, routes the
requests appropriately.

System Design Week 2 - Distributed Cache Flashcards

(10 cards)