Design Twitter Flashcards
Design Twitter 1
CLARIFY requirements/features
Design Twitter 2: Core Features
1) tweeting
2) user timeline (your tweets), home timeline (tweets from people you follow)
3) Following
Naive Solution
User table & Tweets table
Tweets table contains ALL tweets globally with columns ID, CONTENT, USER -> user ID
User table contains ID, NAME
Problem = Tweets table gets huge, and every time we access user timeline, we have to do a huge select statement on Tweets table to get following tweets
Twitter READ features
Twitter is READ heavy. Care more about readability than consistency (it’s ok if one user sees someone’s tweet later than someone else)
Optimized Solution
User Tweet “PUT” API call needs to land and get distributed to a data center with capacity to serve the user. So Tweet -> load balancer -> Redis cluster
Optimized Solution after cluster
Redis machines need to have a huge amount of RAM to store all tweets. Store only timelines for users active recently
What’s in the Redis cluster?
Redis lists. Every user has one list / Redis machine. List contains list of tweets (representing his home timeline). Tweet ID, Sender ID
Performance issue with current design thus far?
What if celeb with millions of followers tweets? Then millions of Redis lists are updated by ONE tweet. Huge computational load, takes long time for Redis lists to update, so some followers don’t see tweet but others do. People could react to a tweet that you never saw.
Solution to celeb issue?
User’s timeline is pre-computed as normal but without celeb’s tweets. Celeb tweet only appears when Bob “refreshes” “run-time” his timeline.
Following feature
PUT -> LB -> check for followers (followers table has REDIS list IDs) -> REDIS
Current approach tradeoffs
Space is not a huge problem because tweets limited to 140 characters. REDIS replicates home timelines x3 but not a huge deal.
When Bob access timeline
1) starts in browser
2) browser hits LB with GET request
3) LB goes to REDIS cluster
4) Only one REDIS cluster has to respond even though three have the timeline.
5) fastest REDIS cluster populates Bob’s timeline
Which REDIS machine to query? There are thousands but only 3 belong to Bob. Solution = HashMap lookup of Bob’s ID : IP address / IDs of Redis machines with Bob’s home timeline
Follow up topics: search
Once Alice puts tweet into LB, it triggers fanout AND stores tweet with index. So when a user searches for a tweet, this is quick