System Design Flashcards

1
Q

Design Distributed Message Queue

A
  • What’s the format and average size of the messages? –> Text only and small to medium size (max being in the kb range)
  • Can messages be consumed repeatedly? Yes
  • Are messages consumed in the same order they were produced? Yes
  • Does data need to be persisted and what is the retention? Yes and 2 week retention
  • What’s the data delivery mechanism? At most once, at least once, or exactly once? At a minimum want to support at least once

Non-functional requirements
- High throughput or low latency
- Scalable, system should be distributed in nature
- Persistent and durable

Apache Kafka uses topics and partitions for efficient data processing. Topics are data categories to which records are published; consumers subscribe to these topics. For scalability and performance, topics are divided into partitions, allowing parallel data processing and fault tolerance. Partitions enable multiple consumers to read concurrently and are replicated across nodes for resilience against failures

Each partition is an ordered, immutable sequence of records, where order is maintained only within the partition, not across the entire topic. This partitioning mechanism enables Kafka to handle a high volume of data efficiently, as multiple producers can write to different partitions simultaneously, and multiple consumers can read from different partitions in parallel

At least once data delivery (kafka default)
Producer sends a message synchronously or asynchronously with a response callback, setting ack=1 or ack=all, to make sure messages are delivered to the broker. If the message delivery fails or timeouts, the producer will keep retrying.

Consumer fetches the message and commits the offset only after the data is successfully processed. If the consumer fails to process the message, it will re-consume the message so there won’t be data loss. On the other hand, if a consumer processes the message but fails to commit the offset to the broker, the message will be re-consumed when the consumer restarts, resulting in duplicates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Design Unique ID Generator

A
  • What are the characteristics of the unique IDs? Must be unique and sortable
  • Do IDs increment by 1? The IDs increment by time but not necessarily only increments by 1
  • Do IDs only contain numerical values? Yes
  • What is the ID length requirement? IDs should fit into 64-bit
  • The system should be able to generate 10k IDs per second

ID will be made up of:
- Sign bit: 1 bit, reserved for future use cases
- Timestamp: 41 bits, milliseconds since epoch or custom datetime. 2 ^ 41 = 2199023255552 which gives us ~69 years = 2199023255552 ms / 1000 / 365 days / 24 hours / 3600 seconds
- Data center ID: 5 bits, 2 ^ 5 = 32 data centers
- Machine ID: 5 bits, 2 ^ 5 = 32 machines / data center
- Sequence number: 12 bits. For each machine/process, the sequence number is incremented by 1. The number is reset every millisecond. 2 ^ 12 = 4096 possible values every millisecond

Future discussions
- Clock synchronization. We’re assuming each ID generation server has the same clock. This might not be necessarily true across multiple machines. Consider Network Time Protocol (NTP) as a solution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Design Search Auto Complete

A
  • How many autocomplete suggestions should be returned? 5
  • How does the system know which results to return? Determined by popularity, decided by the historical frequency count
  • Case insensitive? Yes
  • Can I assume English only? Yes

Non-Functional Requirements
- Fast response time
- Relevancy and Sorted
- High availability

High Level Design
- Data gathering service: aggregates queries and async process for processing historical frequency
- Real time query service: Given a prefix return top k results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Design URL Shortener

A
  • How long is the shortened URL? As short as possible. (let’s assume 7 chars)
  • What characters are allowed in the shortened URL? -> a-zA-Z0-9

Two primary components:

(1) GET /api/v1/<short_url>
return 301 in Location header</short_url>

301 = permanent
302 = temporary

(2) POST /api/v1/url

longURL -> hash function -> hash value (short url)

The hash value can have 62 different characters so how long should it be

62 ^ 1 = 62 possible values
62 ^ 2 = 3,844
62 ^ 7 = 3.5 trillion

We can use a known hash function (eg MD5, SHA1) but we’ll need to get the first 7 chars and perform DB lookup to ensure no collision

2nd approach. Use base conversion to convert the same number between its different number representation systems. Will need a unique ID generator but can use auto-incrementing on DB

base_62(11157) –> 2TX

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Design a Rate Limiter

A

Rate Limiter: limits/throttles the number of client requests allowed to be sent over a time period

  • Client side or server side rate limiting? Server side
  • Does the rate limiter support different kinds of throttling rules? Yes
  • Does the rate limiter need to operate in a distributed fashion? Yes

Non functional requirements
- Low latency
- Distributed

HTTP 429: Too Many Requests
Use response headers to determine ratelimit attributes (eg remaining, limit, retry-after)

Types of rate limiting rules
- Number of requests / second

Algorithms for rate limiting:
- Token bucket. Simple, well understood, used by Stripe and Amazon. It is a bucket with pre-defined capacity and tokens are replenished periodically (say once / second or once / minute, whatever the time period is). Each request consumes one token so if there aren’t enough tokens we drop the request
- How many buckets do we need? Examples include bucket / IP address., global bucket across all requests, bucket / endpoint (eg POST vs GET)
- Leaking bucket: Uses FIFO queue
- Fixed window counter: Divide requests into fixed size time windows
- Sliding window

Counters should be stored in a distributed redis cluster. Why Redis?
- Fast, in memory
- Supports time-based expiration
- Includes atomic INCR and EXPIRE

Why not DB? Slow for random access

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Design a news feed

A

Understand scope of problem

  • Important features are a user publishing a post and seeing their friends posts on the feed
  • How is the feed sorted? To keep it simple let’s use reverse chronological order
  • Can feed contain images, videos, or just text? It can contain all

Two primary workflows
1) Publishing to the feed
2) Building the news feed for a user and reading it

APIs
1) POST /api/v1/me/feed {content, auth}
2) GET /api/v1/me/feed

  • User sends a request to view her news feed, GET /v1/api/me/feed
  • LB -> WebServers -> NewsFeed service -> Fetch news feed from cache
  • News feed cache stores news feed IDs needed to render the news feed

  • Fanout service: The process of delivering a post to friends
  • Two types of fanout models:

1) Fanout on write. News feed is pre-computed at write time
Pros: Fetching news feed is fast
Cons: For inactive users producing the feed wastes resources. Also hot-spot problem in which servers are taxed for popular individual

2) Fanout on read. Feed is generated at read time
Pros: Limits compute on servers, doesn’t waste resources for inactive users
Cons: Fetching the news feed is slow because it’s not pre-computed

A solution? Take a hybrid approach. Try to pre-compute the feed but for users who have a lot of friends let that be on demand

1) Fetch friend_ids (can use a graph db)
2) Send friends list and new post_id to kafka
3) Workers fetch data from Kafka. Append user_id and post_id to the feeds table. This feeds table serves as the news feed cache and we only store post_id so that we’re not storing expensive content (eg images)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Design a KV store

A

There is no perfect design with the following tradeoffs
- Read, write, and memory usage
- Consistency vs availability

We’ll design a system that can:
- Store big data
- Key/value pair is small: less than 10kb
- High availability: Responds quickly, even during failures
- High scalability: Can scale to support larger data sets

Single server:
- Store everything in memory and use a hash table
- Optimizations that can be added:
1) Data compression
2) Only hold frequently used data, spill rest to disk

Distributed KV store

Data Partition
- Infeasible to fit entire data set on single server
- Partition data across multiple servers evenly but must try to minimize data movement when nodes are added/removed (consistent hashing, servers and keys use same hash function on ring)

Data Replication
- Replicate data asynchronously over N servers where N is configurable
- Since data is replicated across multiple nodes, it must be synchronized across replicas. A write quorum of size W, for a write operation to be considered as successful, write operation must be acknowledged from W replicas

Failure detection
- Each node maintains a membership list
- Each node periodically sends heartbeats to random nodes, which in turn propagate to others
- If the heartbeat has not been recorded in predefined period, it’s considered as offline

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Design Google Maps

A

What features to focus on?
- Location updates, navigation ETA, and map rendering

How large is the road data and do we have access to it?
- Yes. Several TBs

Consider traffic conditions for accurate time estimates

Consider different travel modes such as driving, walking, bus

  • Accuracy, users should not be given wrong turns
  • Smooth map rendering
  • Data and battery usage, use as little as possible
  • General availability and scalability requirements

Map Projection:
- The process of translating the points from a 3D globe to a 2D plane
- Many different ways with strengths and limitations. All of them distort the actual geometry in some way

Geohashing
- An encoding system that encodes a geographic area into a short string of letters and digits
- Recursively divides the grid into subgrid squares
- 0th division is a single square represents entire earth
- 1st division is cut into 4 squares.
01 11
00 10
- 2nd division, cut each square 4 more times –> 16 squares now
- Geohashing can be used for map tiling

Map rendering
- Foundational concept is tiling
- World is broken up into smaller tiles and client only downloads relevant tiles for the area, then stitches them together
- Distinct set of tiles of tiles at different zoom levels
- Client only needs tiles for the zoom level of the map viewport
- Prevents consuming too much bandwidth
- For example, a single tile could be 256x256 pixel image

Road data processing for navigation algorithms
- Most routing algos use A* pathfinding
- They operate on a graph data structure
- Intersections are nodes, roads are edges
- Routing algos can use similar concept as tiles.
- Load on demand

Back of the envelope
- What are the storage requirements for the entire collection of map tile images?

Zoom 0: 1 tile
Zoom 1: 4 tiles
Zoom 2: 16 tiles
….
Zoom 21: 4.3 trillion tiles

After roughly 20 zoom levels, you hit practicality limits. Adding more doesn’t really help the user

Assume each tile is 256x256 pixel compressed PNG, means image size is about 100kb

Total size roughly equal 440 PB. Keep in mind 90% of earth is uninhabited. Therefore roughly 50 PB

3 big features
1. Location service
2. Navigation service
3. Map Rendering

  • Responsible for recording a user’s location update
  • Record location update every second but batch them to the server every 15 seconds to significantly reduce update traffic by all clients
    POST /api/v1/locations JSON encoded array of lat, lon, timestamp tuples
  • Millions of writes a day. Use KV store
  • Go with availability over consistency. Location data because stale quick
  • user_id is the partition key, then sort by timestamp
    user_id, timestamp, lat, long user_mode, nav_mode
  • All data with same partition key are stored together

  • Responsible for finding a reasonbly fast route from point A to point B
    GET /api/v1/nav?origin=123+main+st,SF&dest=DisneyLand

  • Client fetches map tiles based on zoom level and map view port
  • We pre generate these and can also sever via CDN
  • They are static
  • Each tile is represent by its geohash
  • Optimization: Send vector information (paths and polygons) instead of images over the network. The client draws the paths from the vector info
  • Vectorized images provide a smoother zoom experience and don’t get stretched

Geocoding database
- DB which stores places and their corresponding lat/lng pairs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Design Google Drive

A
  • Important Features?
    Upload/download (any file type), file sync, notifications
  • Do files need to be encrypted? Yes
  • File size limit? Yes, 10GB

Non-functional requirements
- Reliable. Data loss is unacceptable
- Fast sync speeds
- Minimizing bandwidth usage
- Highly available

  1. Upload a file (two types supported)
    Simple: Use when file size is small
    Resumable: When file size is large and high chance of network interruption
  2. Download a file
    GET https://api.example.com/files/download {‘path’: /a/b/c.txt’}

  • One option is to use S3
  • Another option it to use file block storage and shard by user_id
  • Also need a metadata DB (Postgres) for handling user data, login info, etc

  • Happens when two users modify the same file at the same time. Conflict occurs
  • We can resolve by saying the first version which gets processed wins, the later version will receive a conflict error

  • These are responsible for uploading the blocks to cloud storage
  • A file can be split into blocks, each with their own unique hash value, stored in our metadata DB
  • Each block is treated as an independent object and stored on S3
  • To reconstruct a file, blocks are joined in a particular order
  • This is optimal because sending the whole file (particularly large ones) on each update consumes a lot of bandwidth –> Delta Sync
  • Delta Sync: When a file is modified, only modified blocks are synced instead of the whole file
  • We should also apply compression to significantly reduce the size
  • In this system:
  • Client uploads to block servers
  • Block server split file into blocks
  • Compress and encrypt each one

  • Use long polling because this is not bi-directional

  • De duplicate data blocks
  • Limit number of versions of a particular file
  • Move infrequent data to cold storage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Design YouTube

A

What are the important features?
- Video upload
- Watch a video
- Do we need to support international users? Yes
- Encryption required? Yes
- Max file size for video upload? 1GB
- Can we leverage some existing cloud services? Yes

High level design
- Videos are stored in CDN (AWS Cloudfront, Cloudfare). When you press play it’s streamed from the CDN
- API Servers. Everything else. Feed recommendation, generating video upload URL, updating meta DB, user signup, etc

Transcoding:
- Video encoding. Translate to other file types to provide best
- Two components
1) Container. Contains the content and the format is the file extension, eg .avi, .mov, .mp4
2) Codecs. Compression and decompression algorithms. Reduce the video size while preserving quality, eg H.264, VP9, HEVC

File Upload Flow
- Videos are uploaded to S3
- Transcoding servers fetch video and start transcoding
possible format given device and bandwidth issues
- Once transcoding complete, send to CDN
- Completion handler updates the metadata DB

Video Streaming Flow
- We’re streaming the video from the CDN, not downloading to local device
- We will use a standard streaming protocol (eg MPEG-DASH, Apple HLS, etc)
- We will serve the video from the edge which is closest to the user

Safety Optimization: Presigned upload URLs
1. Client makes an HTTP POST /upload to API servers to receive a pre-signed URL
2. Client uploads the video using the pre-signed URL to S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly