Systems Design Flashcards
What are the steps in request flow and traffic source?
- User accesses website through the domain name via a DNS (Domain Name Service).
- An IP (Internet Protocol) address is returned to the browser or mobile app.
- Once the IP address is obtained, HTTP (Hypertext Transfer Protocol) requests are sent directly to your web server
- The web server returns HTML pages or JSON response for rendering.
When should you consider a non-relational Database?
- Your application requires super-low latency
- Your data is unstructured, or you do not have any relational data
- You only need to serialize and deserialize data (JSON, XML, YAML, etc.)
- You need to store a massive amount of data
Vertical vs. Horizontal Scaling
Vertical Scaling (“scale-up”): the process of adding more power (CPU, RAM, etc.) to your servers.
Horizontal Scaling (“scale out”): allows you to scale by adding more servers into your pool of resources.
Vertical Pros:
- simplicity
- used when traffic is low
Vertical Cons:
- has a hard limit. It is impossible to add unlimited CPU and memory to a single server.
- Does not have failover and redundancy. If one server goes down, the website/app goes down with it completely.
Horizontal Pros:
- Much better for large scale apps
Load Balancer
Evenly distributes incoming traffic among web servers that are defined in a load-balancing set.
Private IP
An IP Address that is only reachable between servers in the same network. It is unreachable over the internet.
Load balancers communicate with web servers through private IPs
Cache
A temporary storage area that stores the result of expensive responses or frequently accessed data in memory so that subsequent requests are served more quickly.
How it works:
- Server receives request from browser, and checks if the cache has the available response.
- If it does, it sends data back to the client. If not, it queries the DB, stores the response in the cache, and sends it back to the client.
This strategy is called a “read-through cache”
Considerations:
When to use a cache:
- When data is read frequently, but modified infrequently
- Cache server is not ideal for persisting data
Expiration Policy:
- Good practice to have one
- once cached data is expired, it is removed from the cache
- Don’t make it too short as this will cause system to reload data from DB frequently
- Not too long because data becomes stale (out of date)
Consistency:
- Keep the data store and the cache in sync
- Inconsistency happens due to data-modifying operations on the data store and cache are not in a single transaction
- When scaling across multiple regions, maintaining consistency b/w the data store and the cache is challenging
Mitigating Failures:
- A single cache server represents a potential single point of failure.
- Multiple cache servers across different data centers are recommended
- Also consider overprovision the required memory by certain percentages to provide a buffer as the memory usage increases
Eviction policy:
- once the cache is full, any requests to add items to the cache might cause existing items to be removed.
- Least-recently-used (LRU) is the most popular cache eviction policy.
- Others include Least Frequently Used (LFU) or First in First Out (FIFO)
Content Delivery Network (CDN)
A network of geographically dispersed servers used to deliver static content.
CDN servers cache static content like images, videos, CSS, Javascript files, etc.
It basically is the same as a cache, but for static content.
How it Works:
- User visits a website
- a CDN server closest to the user will deliver static content
- The further a user is (geographically) from the CDN servers, the slower the website will load
- if the CDN server does not have the content (e.g. image), the CDN server requests it from the origin which can be a web server or online storage like Amazon S3
- The origin returns the image to the CDN server and includes an HTTP header called “Time-to-live” (TTL) which tells the CDN when to expire that content
- Finally, if another user requests that same content, the CDN now has it cached and can return it to User B
Considerations:
- Cost
- Appropriate Cache Expiry
- CDN Fallback
- Invalidating Files by:
- Invalidate the CDN object using APIs provided by CDN
vendors
- Object Versioning
Stateless Web Tier
Downsides of Stateful Arch
- Having to use the same server for each user to keep track of state (authentication for example)
- Adds overhead to make “sticky” sessions
- Handling server failures is difficult
Upsides of Stateless
- HTTP requests from users can be sent to ANY web servers which fetch state data from a shared data store (redis, NoSQL etc.)
- Simpler, more robust, and scalable
- Allows autoscaling of web servers based on traffic load
Data Centers
geoDNS routing
- geoDNS is a DNS service that allows domain names to be resolved to IP addresses based on the location of a user
- users are geo-routed to the closest data center in US-East and US-West (typically, for example).
Technical challenges to achieve multi-data center setup:
- Traffic redirection – GeoDNS can be used to direct traffic to nearest data center depending on where a user is located
- Data synchronization – Users from different regions could use different local DBs or Caches. In failover cases, traffic might be routed to a data center where data is unavailable. A common strategy is to replicate data across multiple data centers.
- Test and deployment – it is important to test your website/app at different locations. Automated deployment tools are vital to keep services consistent through all the data centers
Messaging queue
A durable component, stored in memory, that supports asynchronous communication. It serves as a buffer and distributes asynch requests.
Basic Arch of a Message Queue
- Input services, called producers/publishers, create messages, and publish them to a message queue
- Other services or servers called “consumers/subscribers”, connect to the queue and perform actions defined by the messages
This decoupling makes the message queue a preferred architecture for building a scalable and reliable application.
Use case:
- Your app supports photo customization (cropping, sharpening, etc.)
- Those customization tasks take time to complete
- web servers publish photo processing jobs to the message queue
- Photo processing workers pick up jobs from the queue and asynchronously perform photo customization takss
- the producer and the consumer can be scaled independently
- when the size of the queue becomes large, more workers are added to reduce the processing time.
- however, if the queue is empty most of the time, the number of workers can be reduced
Logging, metrics, automation
Logging:
Monitoring error logs helps to identify errors and problems in the system. You can monitor at per server level or use tools to aggregate them to a centralized service for easy search and viewing
Metrics:
Help us gain business insights and understand the health status of the system. Here are some helpful metrics:
- Host level metrics: CPU, Memory, disk I/O, etc.
- Aggregated level metrics: e.g. performance of entire DB tier, cache tier, etc.
- Key business metrics: daily active users, retention, revenue, etc.
Automation:
- Build, test, deploy process:
Continuous integration is a good practice in which code check-in is verified through automation, allowing teams to detect problems early. - Add message queues and different tools
Database Scaling
Vertical:
- adding more power (CPU, RAM, DISK etc)
- Limitations of hardware
- SPOF risks
- overall cost is high. powerful servers are more expensive
Horizontal (“sharding”)
- Separates large DBs into smaller, more easily managed parts called “shards”.
- Each shard shares the same schema, but the actual data on each shard is unique to the shard
Scale from Zero to Millions of Users (High level)
- Keep web tier stateless (add noSQL/cache to track state)
- Build redundancy at every tier (load balancer, more servers, duplicate DBs, multi-DCs)
- Cache data as much as you can
- Support multiple data centers
- Host static assets in CDN
- Scale your data tier by sharding
- Split tiers into individual services (message queue?)
- Monitor your system and use automation tools
Back-of-envelope: Power of Two
10 == 1 thousand == 1 Kilobyte (KB)
20 == 1 million == 1 Megabyte (MB)
30 == 1 Billion == 1 Gigabyte (GB)
40 == 1 Trillion == 1 Terabyte (TB)
50 == 1 Quadrillion == 1 Petabyte (PB)
Back-of-envelope: Latency numbers every programmer should know
Operation name Time
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 100 ns
Main memory reference 100 ns
Compress 1K bytes with Zippy 10K ns (10µs)
Send 2K bytes over 1 Gbps network 20K ns (20µs)
Read 1 MB sequentially from memory 250K ns (250µs)
Round trip within same datacenter 500K ns (500µs)
Disk seek 10,000,000 ns (10 ms)
Read 1 MB sequentially from the network 10 Mil ns (10 ms)
Read 1 MB sequentially from disk 30 mil ns (30 ms)
Send packet CA -> Netherlands -> CA 150 mil ns (150 ms)
Conclusions:
* memory is fast; disk is slow
* avoid disk seeks if possible
* simple compression algorithms are fast
* compress data before sending it over the internet if possible
* Data centers are usually in different regions, and it takes time to send data between them