Systems Design Flashcards by Joshua Turgeon

What are the steps in request flow and traffic source?

User accesses website through the domain name via a DNS (Domain Name Service).
An IP (Internet Protocol) address is returned to the browser or mobile app.
Once the IP address is obtained, HTTP (Hypertext Transfer Protocol) requests are sent directly to your web server
The web server returns HTML pages or JSON response for rendering.

How well did you know this?

Not at all

Perfectly

When should you consider a non-relational Database?

Your application requires super-low latency
Your data is unstructured, or you do not have any relational data
You only need to serialize and deserialize data (JSON, XML, YAML, etc.)
You need to store a massive amount of data

How well did you know this?

Not at all

Perfectly

Vertical vs. Horizontal Scaling

Vertical Scaling (“scale-up”): the process of adding more power (CPU, RAM, etc.) to your servers.

Horizontal Scaling (“scale out”): allows you to scale by adding more servers into your pool of resources.

Vertical Pros:
- simplicity
- used when traffic is low

Vertical Cons:
- has a hard limit. It is impossible to add unlimited CPU and memory to a single server.
- Does not have failover and redundancy. If one server goes down, the website/app goes down with it completely.

Horizontal Pros:
- Much better for large scale apps

How well did you know this?

Not at all

Perfectly

Load Balancer

Evenly distributes incoming traffic among web servers that are defined in a load-balancing set.

How well did you know this?

Not at all

Perfectly

Private IP

An IP Address that is only reachable between servers in the same network. It is unreachable over the internet.

Load balancers communicate with web servers through private IPs

How well did you know this?

Not at all

Perfectly

Cache

A temporary storage area that stores the result of expensive responses or frequently accessed data in memory so that subsequent requests are served more quickly.

How it works:
- Server receives request from browser, and checks if the cache has the available response.
- If it does, it sends data back to the client. If not, it queries the DB, stores the response in the cache, and sends it back to the client.

This strategy is called a “read-through cache”

Considerations:

When to use a cache:
- When data is read frequently, but modified infrequently
- Cache server is not ideal for persisting data

Expiration Policy:
- Good practice to have one
- once cached data is expired, it is removed from the cache
- Don’t make it too short as this will cause system to reload data from DB frequently
- Not too long because data becomes stale (out of date)

Consistency:
- Keep the data store and the cache in sync
- Inconsistency happens due to data-modifying operations on the data store and cache are not in a single transaction
- When scaling across multiple regions, maintaining consistency b/w the data store and the cache is challenging

Mitigating Failures:
- A single cache server represents a potential single point of failure.
- Multiple cache servers across different data centers are recommended
- Also consider overprovision the required memory by certain percentages to provide a buffer as the memory usage increases

Eviction policy:
- once the cache is full, any requests to add items to the cache might cause existing items to be removed.
- Least-recently-used (LRU) is the most popular cache eviction policy.
- Others include Least Frequently Used (LFU) or First in First Out (FIFO)

How well did you know this?

Not at all

Perfectly

Content Delivery Network (CDN)

A network of geographically dispersed servers used to deliver static content.

CDN servers cache static content like images, videos, CSS, Javascript files, etc.

It basically is the same as a cache, but for static content.

How it Works:
- User visits a website
- a CDN server closest to the user will deliver static content
- The further a user is (geographically) from the CDN servers, the slower the website will load
- if the CDN server does not have the content (e.g. image), the CDN server requests it from the origin which can be a web server or online storage like Amazon S3
- The origin returns the image to the CDN server and includes an HTTP header called “Time-to-live” (TTL) which tells the CDN when to expire that content
- Finally, if another user requests that same content, the CDN now has it cached and can return it to User B

Considerations:
- Cost
- Appropriate Cache Expiry
- CDN Fallback
- Invalidating Files by:
- Invalidate the CDN object using APIs provided by CDN
vendors
- Object Versioning

How well did you know this?

Not at all

Perfectly

Stateless Web Tier

Downsides of Stateful Arch
- Having to use the same server for each user to keep track of state (authentication for example)
- Adds overhead to make “sticky” sessions
- Handling server failures is difficult

Upsides of Stateless
- HTTP requests from users can be sent to ANY web servers which fetch state data from a shared data store (redis, NoSQL etc.)
- Simpler, more robust, and scalable
- Allows autoscaling of web servers based on traffic load

How well did you know this?

Not at all

Perfectly

Data Centers

geoDNS routing
- geoDNS is a DNS service that allows domain names to be resolved to IP addresses based on the location of a user
- users are geo-routed to the closest data center in US-East and US-West (typically, for example).

Technical challenges to achieve multi-data center setup:
- Traffic redirection – GeoDNS can be used to direct traffic to nearest data center depending on where a user is located
- Data synchronization – Users from different regions could use different local DBs or Caches. In failover cases, traffic might be routed to a data center where data is unavailable. A common strategy is to replicate data across multiple data centers.
- Test and deployment – it is important to test your website/app at different locations. Automated deployment tools are vital to keep services consistent through all the data centers

How well did you know this?

Not at all

Perfectly

Messaging queue

A durable component, stored in memory, that supports asynchronous communication. It serves as a buffer and distributes asynch requests.

Basic Arch of a Message Queue
- Input services, called producers/publishers, create messages, and publish them to a message queue
- Other services or servers called “consumers/subscribers”, connect to the queue and perform actions defined by the messages

This decoupling makes the message queue a preferred architecture for building a scalable and reliable application.

Use case:
- Your app supports photo customization (cropping, sharpening, etc.)
- Those customization tasks take time to complete
- web servers publish photo processing jobs to the message queue
- Photo processing workers pick up jobs from the queue and asynchronously perform photo customization takss
- the producer and the consumer can be scaled independently
- when the size of the queue becomes large, more workers are added to reduce the processing time.
- however, if the queue is empty most of the time, the number of workers can be reduced

How well did you know this?

Not at all

Perfectly

Logging, metrics, automation

Logging:

Monitoring error logs helps to identify errors and problems in the system. You can monitor at per server level or use tools to aggregate them to a centralized service for easy search and viewing

Metrics:

Help us gain business insights and understand the health status of the system. Here are some helpful metrics:
- Host level metrics: CPU, Memory, disk I/O, etc.
- Aggregated level metrics: e.g. performance of entire DB tier, cache tier, etc.
- Key business metrics: daily active users, retention, revenue, etc.

Automation:

Build, test, deploy process:
Continuous integration is a good practice in which code check-in is verified through automation, allowing teams to detect problems early.
Add message queues and different tools

How well did you know this?

Not at all

Perfectly

Database Scaling

Vertical:
- adding more power (CPU, RAM, DISK etc)
- Limitations of hardware
- SPOF risks
- overall cost is high. powerful servers are more expensive

Horizontal (“sharding”)
- Separates large DBs into smaller, more easily managed parts called “shards”.
- Each shard shares the same schema, but the actual data on each shard is unique to the shard

How well did you know this?

Not at all

Perfectly

Scale from Zero to Millions of Users (High level)

Keep web tier stateless (add noSQL/cache to track state)
Build redundancy at every tier (load balancer, more servers, duplicate DBs, multi-DCs)
Cache data as much as you can
Support multiple data centers
Host static assets in CDN
Scale your data tier by sharding
Split tiers into individual services (message queue?)
Monitor your system and use automation tools

How well did you know this?

Not at all

Perfectly

Back-of-envelope: Power of Two

10 == 1 thousand == 1 Kilobyte (KB)
20 == 1 million == 1 Megabyte (MB)
30 == 1 Billion == 1 Gigabyte (GB)
40 == 1 Trillion == 1 Terabyte (TB)
50 == 1 Quadrillion == 1 Petabyte (PB)

How well did you know this?

Not at all

Perfectly

Back-of-envelope: Latency numbers every programmer should know

Operation name Time

L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 100 ns
Main memory reference 100 ns
Compress 1K bytes with Zippy 10K ns (10µs)
Send 2K bytes over 1 Gbps network 20K ns (20µs)
Read 1 MB sequentially from memory 250K ns (250µs)
Round trip within same datacenter 500K ns (500µs)
Disk seek 10,000,000 ns (10 ms)
Read 1 MB sequentially from the network 10 Mil ns (10 ms)
Read 1 MB sequentially from disk 30 mil ns (30 ms)
Send packet CA -> Netherlands -> CA 150 mil ns (150 ms)

Conclusions:
* memory is fast; disk is slow
* avoid disk seeks if possible
* simple compression algorithms are fast
* compress data before sending it over the internet if possible
* Data centers are usually in different regions, and it takes time to send data between them

How well did you know this?

Not at all

Perfectly

Back of envelope: Availability Numbers

Study These Flashcards

High availability is the ability of a system to be continuously operational for a desirably long period of time. High availability is measured as a percentage, with 100% means a service that has 0 downtime.

Most services fall between 99% and and 100%

A service level agreement (SLA) is a commonly used term for service providers. This is an agreement between you ( the service provider) and your customer and formally defines the level of uptime your service will deliver.

Questions to ask at the beginning of the Systems Design Interview:

Study These Flashcards

What specific features are we going to build? Which are most important?
How many users does the product have?
How fast does the company anticipate to scale up? What are the anticipated scales in 3 months, 6 months, a year?
What is the company’s tech stack? What existing services you might leverage to simplify the design?
Is this a mobile app, web app, or both?
What is the traffic volume?

Framework for Systems Design Interviews (4 steps)

Study These Flashcards

Step 1 - Understand the problem and establish design scope
* Ask clarifying questions
* See previous card (Questions to ask…)

Step 2 - Propose high-level design and get buy-in
* collaborate with the interviewer
* draw up initial blueprint, ask for feedback, treat them as teammate
* Draw box diagrams with key components on the whiteboard or paper. (clients, APIs, web servers, data stores, cache, CDN, message queue, etc.)
* Do back-of-the-envelope calcs to evaluate if your blueprint fits the scale constraints. Think outloud. Ask if BOTE is necessary before diving into it

If possible go through a few concrete use cases. This may also help you find som edge cases

Step 3 - Design Deep Dive
At this point you should have already:
1) Agreed on the overall goals and feature scope
2) Sketched out a high-level blueprint for overall design
3) Obtained feedback from interviewer on high-level design
4) Had some initial ideas about areas to focus on in deep dive based on her feedback

Work with interviewer to identify and prioritize components in the arch.
Every interview is different, so try to gauge where they want you to focus (more high-level, bottlenecks, system performance, etc.)
Be wary of time management
don’t get into unnecessary details

Step 4 - Wrap up
* Never say nothing can be improved
* Look for bottlenecks or discuss potential improvements
* Give a recap of your design
* Error cases (server failure, network loss, etc.)
* Operation issues - How do you monitor metrics and error logs? How to roll out the system?
* How to handle the next scale curve
* Propose other refinements you need if you have time

Various DO’s and DON’Ts

Do:
* Always ask for clarification
* Understand the requirements of the problem
* There is no right answer. Understand the reqs!
* Let the interviewer know what you’re thinking. Communicate!
* Suggest multiple approaches if possible
* Once you agree with your interviewer on the blueprint, go into details on each component. Design most critical ones first
* Bounce ideas off the interviewer.
* Never give up!

Don’t:
* Don’t be unprepared for typical interview questions
* Don’t jump into a solution without clarifying the reqs
* Don’t go into too much detail on a single component in the beginning. High-level first, then drill down
* If you get stuck, don’t hesitate to ask for hints
* Don’t think in silence, communicate!
* Don’t assume interview is done once you give the design. Ask for feedback early and often

Time allocation on each step:

Step 1 - Understand the problem and scope: 5-7 mins
Step 2 - Propose high-level design and get buy-in: 10-15 mins
Step 3 - Design deep dive: 15-20 mins
Step 4 - Wrap up: 3 mins

Rate Limiter

Study These Flashcards

Benefits of using one:
* Prevent resource starvation caused by Denial of Service (DoS) attack
* Reduce cost
* Prevent servers from being overloaded

Algos for Rate Limiting

Study These Flashcards

Token bucket
Leaking bucket
Fixed window counter
Sliding window log
Sliding window counter

Token bucket algorithm

Study These Flashcards

Pros:
* Easy to implement
* Memory efficient
* Token bucket allows a burst of traffic for short periods. A request can go through as long as there are tokens left.

Cons:
* Two parameters in the algo are bucket size and token refill rate. However, it might be challenging to tune them properly.

Leaking bucket algorithm

Study These Flashcards

Similar to Token bucket, except requests are processed at a fixed rate.

Usually implemented with a First in First out queue (FIFO).

Params:
* Bucket size
* Outflow rate: defines how many requests can be processed at a fixed rate, usually in seconds

Pros:
* Memory efficient given the limited queue size
* Requests are processed at a fixed rate therefore it is suitable for use cases that a stable outflow rate is needed

Cons:
* A burst of traffic fills up the queue with old requests, and if they are not processed in time, recent requests will be rate limited
* There are two parameters in the algorithm. It might not be easy to tune them properly

Fixed window counter algo

Study These Flashcards

The algo divides the timeline into fix-sized time windows and assign a counter for each window
Each request increments the counter by one
Once the counter reaches the pre-defined threshold, new requests are dropped until a new time window starts

Pros:
* Memory efficient
* Easy to understand
* Resetting available quota at the end of a unit time window fits certain use cases

Cons:
* Spike in traffic at edges of a window could cause more requests than allowed quota to go through

Sliding window log algo

Study These Flashcards

Fixes the edges issue in fixed window algo

The algo keeps track of request timestamps. Timestamp data typically kept in cache such as sorted sets of Redis
When a new request comes in, remove all the outdated timestamps (Older than the start of the current time window)
Add timestamp of the new request to the log
If the log size is the same or lower than the allowed count, request is accepted. Otherwise, it is dropped.

Pros:
* Rate limiting implemented by this algo is very accurate. In any rolling window, requests will not exceed the rate limit

Cons:
* The algo consumes a lot of memory because even if a request is rejected, its timestamp might still be stored in memory

Sliding window counter algo

Hybrid of fixed and log window approaches Use a mathmatical formula to calculate an estimate of the current amount of requests in a "rolling minute" using a percentage of the previous minute's request count added to the current minute's request count Pros: * It smooths out spikes in traffic because the rate is based on the average rate of the previous window * Memory efficient Cons: * It only works for not-so-strict look back window. It is an approximation of the actual rate because it assumes requests in the previous window are evenly distributed. However, some evidence suggests this problem is pretty minimal in real cases.

High level middleware Rate Limiter

* Client sends a request to the rate limiting middleware * Rate limiting middleware fetches the counter form the corresponding bucket in Redis and checks if the limit is reached or not * if reached, request is rejected * if not, request is sent to API servers. Meanwhile, the system increments the counter and saves it back to Redis

How are rate limiting rules created? Where are they stored?

Usually written in config files and saved on disk

Exceeding the rate limit

API returns an HTTP response code 429 (too many requests) to the client. Depending on the use cases, we may enqueue the rate-limited requests to be processed later. For example processing orders.

How does a client know whether it is being throttled? How does it know the number of allowed remaining requests?

The answer lies in the HTTP response headers X-Ratelimit-Remaining: The remaining number of allowed requests within the window. X-Ratelimit-Limit: It indicates how many calls the client can make per time window. X-Ratelimit-Retry-After: The number of seconds to wait until you can make a request again without being throttled. When the user has sent too many requests, the 429 and the X-Ratelimit-Retry-After header are sent back to the client

Overall Rate limiter Operations (Detailed Design)

* Rules are stored on disk * workers (dedicated servers) frequently pull rules from the disk and store them in the cache * When a client sends a request to the server, it is first sent to the rate limiter middleware * The middleware loads rules from the cache * It also fetches counters and last request timestamp from Redis cache * Based on the response, it decides: * if the request is not rate limited, forward to API * else, return 429 to client * limited requests can either be dropped or forwarded to a message queue for later processing

Challenges to building Rate Limiter in a distributed environment

* Race conditions * Synchronization issues Race conditions can be resolved by locks, however this will significantly slow down the system. Two other strategies are used: * Lua script * Sorted sets data structure in Redis Synchronization issue * sticky sessions that allow a client to send traffic to the same rate limiter. This is not scalable nor flexible tho so bad solution * Centralized data stores like Redis

Performance optimization for Rate Limiter

* Multi-data center setup to reduce latency for users located far away from the data center * synchronize data with an eventual consistency model.

301 Redirect vs 302 Redirect

301: Shows requested (shortened) URL as "Permanently" moved to the long URL. This means that: - browser caches the response - subsequent requests to this url will not be sent to the URL shortening service, but redirected directly to the long url server 302: "Temporarily" moved to the long URL, and all subsequent requests will still be sent to the URL shortening service first Tradeoffs: - If priority is to reduced server load, 301 is best - If priority is better analytics, 302 as it allows us to track things like click rate and source of click more easily

Well known hash functions

Hash function Hash value (Hexadecimal) CRC32 5cb54054 MD5 5a62509a84df9ee03fe1230b9df8b84e SHA-1 0eeae7916c06853901d9ccbefbfcaf4de57ed85b

Systems Design Flashcards

(35 cards)