System Design Week 1 - Twitter Flashcards
What are Functional Requirements?
- User account management support.
- User should be able to post tweet with
- 140 characters
- media
- hashtags
- user can follow unfollow users
- user should be able to visualize feeds of tweets
- users should be able to like, comment on tweet
- trending hashtags for all geolocations should be available
- Analytics as a background
- search support
What are the Non-Functional Requirements?
- Number of users 400 Million
- number of tweets per sec 6000
- number of feed views per sec 0.3 Million
What are the Microservices?
- *Tweet ingestion service
- *social graph
- *feed generator
- *feed dashboard
- *endorsement service
- Search
- Trending hashtag
- Analytics
- Account management service
Create a Logical Diagram for Twitter.
https://drive.google.com/file/d/1zwfcMZPQKGm2Dzso8gjr-2-GMIf5Q45x/view?usp=sharing
What is the Schema?
https://drive.google.com/file/d/1QijaEz0hNuTDn-4ursycbQwciwJSSVk0/view?usp=sharing
What are the APIs?
- insertTweetText(uid, content)
- insertTweetMedia(userId, bytestream, offset, length)
- likeTweet(tweetId)
- dislikeTweet(tweetid)
- replyTweet(tweetId, content)
- search(text)
- follow(userId)
- unfollow(userId)
- retweet(tweetId)
What is the Business Logic?
When end users post Tweets on Twitter, the load balancers forward these requests to the server handling the Tweet service. The server identifies the attachments (image, video) in the Tweet and stores them in the Blobstore. Text in the Tweets, user information, and all metadata are stored in the different databases. Data is stored in the Bigtable(Google Cloud Bigtable
), which is fully managed, easily scalable, and sorted keys. Assume the user sends a home timeline request using the /viewHome_timeline API. In a similar way, we will obtain the Top-k trends attached in the response to the timeline request.
What is the Microservices Design Consideration
CAP theorem
AP system, must explain why AP and not CP
Scaling
Must discuss the reasons which all applicable among below-
Scale for storage
Scale for throughput
Scale for API parallelization
Need to remove hotspot
Availability and Geo distribution
Sharding
Explanation why(or why not) sharding is required here
Vertical or horizontal sharding is required.
What will be the partition key?
Fixed number of shards or dynamic shard servers are required.
Consistent hashing must be mentioned with dynamic number of shards
Here,
for text → horizontal sharding
for media → horizontal+vertical
Replication
Required. Must explain reason
eg. for availability as well as throughput
Caching
Must explain well if caching is required or not.
If caching is required then which caching mechanism.
What is the eviction policy in cache.
API Parallelisation
Must explain well that API parallelization is required only when APIs are bulky.
Here,
for text → no
For media → maybe yes
GeoDistribution
Geo distribution of data is not required here. Must be called out if it is required or not, and why.
Load Balancing
Explanation of the need of load balancing for each service.
Purging/ Cleanup
Cleanup of data is required or not.