System Design - Top K Songs Spotify Flashcards

Question 1

Q

What are the functional requirements?

Answer

A

Client can query for the top K songs
Time periods can be 1 hour, 1 day, 1 month, all-time

Also consider mock interview where could be arbitrary time period minute N to minute N+M

Question 2

Q

Non functional requirements?

Answer

A

one minute delay between song play and it reflecting in stats
Results should be precise
Should be able to handle massive traffic (millions of song plays per second)
Support massive number users
return results within 10s of milliseconds
System should be economical. Shouldn’t require 10K servers to do this problem

Question 3

Q

Scale estimates

Answer

A

70 B views per day / (100K seconds per day) = 700K transactions per second

Videos:
3.6 B videos

Storage for (video_id, view_count) combination =
4 B videos x (8 bytes for ID + 8 bytes for view_count) = 64 GB

Question 4

Q

Good approach to solve System design step by step. Explain this to the interviewer

Answer

A

Generate a basic (but not scalable solution) to the all-time top K problem.

Solve the primary issues of our basic solution.

Add a solution for the time period inputs.

Deep dive remaining bottlenecks until we run out of time.

Question 5

Q

Core Entities

Answer

A

Video
View
Time Window

Question 6

Q

API / Interface

Answer

A

Just need an API to retrieve top K views

GET / videos/top?window=WINDOW&topk=k

Response: {
videos: [
{ video_id: 1, views: 100 },
{ video_id: 234, views: 99 } …
]}

Question 7

Q

What is a simple basic solution? Don’t worry about bottlenecks and scale yet.

Answer

A

Hash table has a Counts table with (video_id, count).
Also have a Heap that holds the top 1000 video counts.
When a video is viewed, Kafka consumer will update the counter in the table.
Compare the video’s count against the smallest item in the heap. If greater, then pop the min and insert the new value.

Question 8

Q

How to scale the simple solution

Answer

A

Have multiple replicas of the simple solution. We’ll have read replicas. We can also have snapshots of the memory.
There are still problems with scaling the writes. Also, in the case of a failure, we need to catch up quickly (from reading from Kafka).

Question 9

Q

How can we scale the writes?

Answer

A

Create a number of shards, P. Each shard will be it’s own cluster (with leader and replicas). Each shard will be assigned a certain range of IDs (keyspace). This can be done using consistent hashing.

There will be a microservice, Top K, that queries each of the shards for the top 1000, then merges the result.

Need ZooKeeper to monitor each of the shards.

Question 10

Q

How to handle time windows?

Answer

A

In each shard, we’ll have 4 heaps (one for each time window: hour, day, month, all-time).

Example for the 1 hour window. Have another consumer whose job is to decrement the old views (that are now older than 1 hour).

That consumer will have the following logic:

If you have a stream with timestamps, you simply pause reading when the latest timestamp > NOW() - window and start consuming again when that’s not true. Store the offsets in your checkpoints.

Note this means we need retention of items for at least one month.

Also, we should increase the size of our heap.

Question 11

Q

How to handle lots of reads for the Top K microservice?

Answer

A

Cache the results in Redis. Most of the time, the service will hit the cache. Every minute, call out to the shards and merge results and find the actual result and cache it again.

System Design - Top K Songs Spotify Flashcards

(11 cards)