Design Twitter Flashcards

Question 1

Q

Design Twitter 1

Answer

A

CLARIFY requirements/features

Question 2

Q

Design Twitter 2: Core Features

Answer

A

1) tweeting
2) user timeline (your tweets), home timeline (tweets from people you follow)
3) Following

Question 3

Q

Naive Solution

Answer

A

User table & Tweets table
Tweets table contains ALL tweets globally with columns ID, CONTENT, USER -> user ID
User table contains ID, NAME
Problem = Tweets table gets huge, and every time we access user timeline, we have to do a huge select statement on Tweets table to get following tweets

Question 4

Q

Twitter READ features

Answer

A

Twitter is READ heavy. Care more about readability than consistency (it’s ok if one user sees someone’s tweet later than someone else)

Question 5

Q

Optimized Solution

Answer

A

User Tweet “PUT” API call needs to land and get distributed to a data center with capacity to serve the user. So Tweet -> load balancer -> Redis cluster

Question 6

Q

Optimized Solution after cluster

Answer

A

Redis machines need to have a huge amount of RAM to store all tweets. Store only timelines for users active recently

Question 7

Q

What’s in the Redis cluster?

Answer

A

Redis lists. Every user has one list / Redis machine. List contains list of tweets (representing his home timeline). Tweet ID, Sender ID

Question 8

Q

Performance issue with current design thus far?

Answer

A

What if celeb with millions of followers tweets? Then millions of Redis lists are updated by ONE tweet. Huge computational load, takes long time for Redis lists to update, so some followers don’t see tweet but others do. People could react to a tweet that you never saw.

Question 9

Q

Solution to celeb issue?

Answer

A

User’s timeline is pre-computed as normal but without celeb’s tweets. Celeb tweet only appears when Bob “refreshes” “run-time” his timeline.

Question 10

Q

Following feature

Answer

A

PUT -> LB -> check for followers (followers table has REDIS list IDs) -> REDIS

Question 11

Q

Current approach tradeoffs

Answer

A

Space is not a huge problem because tweets limited to 140 characters. REDIS replicates home timelines x3 but not a huge deal.

Question 12

Q

When Bob access timeline

Answer

A

1) starts in browser
2) browser hits LB with GET request
3) LB goes to REDIS cluster
4) Only one REDIS cluster has to respond even though three have the timeline.
5) fastest REDIS cluster populates Bob’s timeline
Which REDIS machine to query? There are thousands but only 3 belong to Bob. Solution = HashMap lookup of Bob’s ID : IP address / IDs of Redis machines with Bob’s home timeline

Question 13

Q

Follow up topics: search

Answer

A

Once Alice puts tweet into LB, it triggers fanout AND stores tweet with index. So when a user searches for a tweet, this is quick

Design Twitter Flashcards

(13 cards)