System Design Week 2 - Distributed Cache Flashcards

1
Q

Problem 1) Distributed Cache

Question 1) List down the functional requirements

A
  • put(key, value)
  • get(key)
  • Support LRU Eviction policy (or make it configurable)
  • Support persistence.
  • Ignore security
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Problem 1) Distributed Cache

Question 2) List down the non functional requirements

A
  • Scalable
  • Available
  • High Performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Problem 1) Distributed Cache

Question 3) List down the microservices covering all functional requirements

A
  • Cache client - Not really a service, but a library
  • Cache Server - A collection of server components that manage the cache.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Problem 1) Distributed Cache

Question 4) Logical Diagram & flow of data

A

Diagram: https://drive.google.com/file/d/1H_oF4bNQDzG57k0cl2MPgun4gLNbv_U-/view?usp=sharing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Problem 1) Distributed Cache

Question 5) Schema Design

A

Concrete schema is not required here.

Data persistence :

Main memory is volatile, so we need to store the data on the
secondary memory.

● Two common techniques
- Append only file/logs
■ Data is written to disk on every write
■ Data is written to disk at a fixed time interval (1 sec)
- Snapshot
■ Snapshot of entire data is created at configurable time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Problem 1) Distributed Cache

Question 6) API Design

A
  • put(key, value)
  • get(key)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Problem 1) Distributed Cache

Question 7) Business logic for the problem statement

A

What is LRU ?
→ A combination of Doubly Linked List and a Hash Table, or a LinkedHashMap
● DLL nodes will have the key
● On access
- Move the key to the head of the DLL
● To evict
- Remove the key from the tail
- Delete from the Hash Table
● Update existing value
- Update the Hash Table
- Move the key to the head of the DLL
● Adding a new key
- Add to Hash Table
- Add the key to the head of the DLL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Problem 1) Distributed Cache

Question 8) Design Consideration for the problem statement

A

https://drive.google.com/file/d/1fN2-Lui7LEgOg89PfgGhHYbNsS1hpwQs/view?usp=sharing

and

https://drive.google.com/file/d/1BJTqYpEEweBbf4gl_2KuE3LBHalZhlHd/view?usp=sharing

A deterministic set of reasons can be posed across all interviews
○ Need to scale for storage (storage and cache tiers) - Yes
○ Need to scale for throughput (CPU/IO) - Yes
○ Need to scale for API parallelization - No
○ Need to remove hotspots - Maybe
○ Availability and Geo-distribution - Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Problem 2) How to Scale

Question 1) Replication

A
  • Copy the data on multiple read replicas
  • All writes could be handled by primary, reads could be handled by any replicas
  • Strong consistency or Weak/Eventual consistency?

■ Once write is done by primary, when do you send ack back to the user would
change based on this.

■ Additional complexity of 2 or 3 phase commit or paxos for strong
consistency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Problem 2) How to Scale

Question 2) Sharding

A

Distribute data across multiple cache servers

  • Consistent hashing or equivalent techniques to distribute keys to hosts.
  • How do the clients know which server to get the data from?
    ■ Central router
    ● Client connects to central router
    ● Router knows which node has which keys, routes the request
    ■ Decentralized
    ● Client knows all servers (but not the mapping of keys)
    ● Client picks a server randomly or round robin etc
    ● Every server knows which keys maps to which node, routes the
    requests appropriately.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly