System Design - Top K Songs Spotify Flashcards

1
Q

What are the functional requirements?

A
  1. Client can query for the top K songs
  2. Time periods can be 1 hour, 1 day, 1 month, all-time

Also consider mock interview where could be arbitrary time period minute N to minute N+M

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Non functional requirements?

A
  1. one minute delay between song play and it reflecting in stats
  2. Results should be precise
  3. Should be able to handle massive traffic (millions of song plays per second)
  4. Support massive number users
  5. return results within 10s of milliseconds
  6. System should be economical. Shouldn’t require 10K servers to do this problem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Scale estimates

A

70 B views per day / (100K seconds per day) = 700K transactions per second

Videos:
3.6 B videos

Storage for (video_id, view_count) combination =
4 B videos x (8 bytes for ID + 8 bytes for view_count) = 64 GB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Good approach to solve System design step by step. Explain this to the interviewer

A

Generate a basic (but not scalable solution) to the all-time top K problem.

Solve the primary issues of our basic solution.

Add a solution for the time period inputs.

Deep dive remaining bottlenecks until we run out of time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Core Entities

A

Video
View
Time Window

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

API / Interface

A

Just need an API to retrieve top K views

GET / videos/top?window=WINDOW&topk=k

Response: {
videos: [
{ video_id: 1, views: 100 },
{ video_id: 234, views: 99 } …
]}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a simple basic solution? Don’t worry about bottlenecks and scale yet.

A
  1. Hash table has a Counts table with (video_id, count).
  2. Also have a Heap that holds the top 1000 video counts.
  3. When a video is viewed, Kafka consumer will update the counter in the table.
  4. Compare the video’s count against the smallest item in the heap. If greater, then pop the min and insert the new value.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to scale the simple solution

A

Have multiple replicas of the simple solution. We’ll have read replicas. We can also have snapshots of the memory.
There are still problems with scaling the writes. Also, in the case of a failure, we need to catch up quickly (from reading from Kafka).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can we scale the writes?

A

Create a number of shards, P. Each shard will be it’s own cluster (with leader and replicas). Each shard will be assigned a certain range of IDs (keyspace). This can be done using consistent hashing.

There will be a microservice, Top K, that queries each of the shards for the top 1000, then merges the result.

Need ZooKeeper to monitor each of the shards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to handle time windows?

A

In each shard, we’ll have 4 heaps (one for each time window: hour, day, month, all-time).

Example for the 1 hour window. Have another consumer whose job is to decrement the old views (that are now older than 1 hour).

That consumer will have the following logic:

If you have a stream with timestamps, you simply pause reading when the latest timestamp > NOW() - window and start consuming again when that’s not true. Store the offsets in your checkpoints.

Note this means we need retention of items for at least one month.

Also, we should increase the size of our heap.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to handle lots of reads for the Top K microservice?

A

Cache the results in Redis. Most of the time, the service will hit the cache. Every minute, call out to the shards and merge results and find the actual result and cache it again.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly