Design Twitter Flashcards

1
Q

Design Twitter 1

A

CLARIFY requirements/features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Design Twitter 2: Core Features

A

1) tweeting
2) user timeline (your tweets), home timeline (tweets from people you follow)
3) Following

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Naive Solution

A

User table & Tweets table
Tweets table contains ALL tweets globally with columns ID, CONTENT, USER -> user ID
User table contains ID, NAME
Problem = Tweets table gets huge, and every time we access user timeline, we have to do a huge select statement on Tweets table to get following tweets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Twitter READ features

A

Twitter is READ heavy. Care more about readability than consistency (it’s ok if one user sees someone’s tweet later than someone else)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Optimized Solution

A

User Tweet “PUT” API call needs to land and get distributed to a data center with capacity to serve the user. So Tweet -> load balancer -> Redis cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Optimized Solution after cluster

A

Redis machines need to have a huge amount of RAM to store all tweets. Store only timelines for users active recently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What’s in the Redis cluster?

A

Redis lists. Every user has one list / Redis machine. List contains list of tweets (representing his home timeline). Tweet ID, Sender ID

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Performance issue with current design thus far?

A

What if celeb with millions of followers tweets? Then millions of Redis lists are updated by ONE tweet. Huge computational load, takes long time for Redis lists to update, so some followers don’t see tweet but others do. People could react to a tweet that you never saw.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Solution to celeb issue?

A

User’s timeline is pre-computed as normal but without celeb’s tweets. Celeb tweet only appears when Bob “refreshes” “run-time” his timeline.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Following feature

A

PUT -> LB -> check for followers (followers table has REDIS list IDs) -> REDIS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Current approach tradeoffs

A

Space is not a huge problem because tweets limited to 140 characters. REDIS replicates home timelines x3 but not a huge deal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When Bob access timeline

A

1) starts in browser
2) browser hits LB with GET request
3) LB goes to REDIS cluster
4) Only one REDIS cluster has to respond even though three have the timeline.
5) fastest REDIS cluster populates Bob’s timeline
Which REDIS machine to query? There are thousands but only 3 belong to Bob. Solution = HashMap lookup of Bob’s ID : IP address / IDs of Redis machines with Bob’s home timeline

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Follow up topics: search

A

Once Alice puts tweet into LB, it triggers fanout AND stores tweet with index. So when a user searches for a tweet, this is quick

How well did you know this?
1
Not at all
2
3
4
5
Perfectly