System Design Flashcards
When should you generally choose a NoSQL database?
- app requires super low latency
- data is unstructured
- need to store massive amount of data (e.g., for horizontal scaling)
When should you use a relational database?
What are advantages of database replication?
- better performance (writes to master, reads to slaves)
- reliability (if one DB is destroyed, the data is preserved in the replications)
- high availability (if one DB goes down we can still access from replications)
What are some ways we can reduce the load/response time of a request?
- implement a cache layer for common data requests
- implement a CDN for static content
When should we consider using a cache?
- when there are way more reads than writes
- when we want to reduce the response time of a request
What are some considerations to keep in mind when implementing a cache?
- is it appropriate? (Need a lot more reads than writes to make it appropriate)
- expiration policy
- consistency (keeping the data store and cache in sync)
- mitigating failures (multiple caches in different data centres; overprovision the required memory by certain percentages)
- eviction policy (i.e., what to do when cache is full). Least recently used (LRU) is most common. Least frequently used or FIFO or others
What is a CDN?
- content delivery network
- third party cache for static files (e.g., html/css/javascript files, images, videos, etc
What are some considerations to think about when implementing a CDN?
- cost; they’re third party providers, so caching infrequently used data provides no benefit and costs money
- TTL (time to live/expiry time)
- CDN failure - clients should be able to request from origin if there is a failure
- invalidating files; when a file changes you have to invalidate what’s in the CDN. Invalidation can be done either through an API provided by the CDN provider, or by keeping different versions of the file which can then be accessed through query strings
How do you keep your web tier stateless?
- move user session data to persistent storage (i.e., the database); NoSQL is a good choice because it’s easier to scale
Why is stateless architecture important?
- handling failures (e.g., if a users session is on one server what happens when that server fails)
- adding and removing servers
- load balancing
What is a geo-DNS?
- a DNS that resolves the domain name to the IP address of the closest data centre to the users location (only used when there are multiple data centres in different locations in the world)
What technical challenges are involved with a multiple data centre setup?
- traffic redirection (geo-DNS)
- data synchronization (generally want to replicate data across multiple data centres)
- test and deployment (want to make sure it’s all working the same between each data centre; automatic deployment tools are crucial)
What is a message queue and what’s its purpose?
- a message queue is a service that separates requests between producers and consumers
- a producer sends a task to the message queue and then the consumers pick up tasks from the message queue and perform the task
- an example is if your app supports photo processing. Since processing takes time and resources, you don’t necessarily want to do it on the same server that your clients are connected to so you don’t clog up the resources. Therefore you might implement a message queue so your app users can request a processing to your photo (e.g., blur the photo), then the consumers (other servers responsible for doing the blurring) pick up the messages and perform the task when they are available to do so, and return the processed image
- the purpose is to decouple the web servers with the processing tasks so your servers are not overworked, and so you can scale the producers and the consumers independently
What are the 4 different categories of NoSQL databases?
- key-value
- wide column
- documents
- graph
What is a key-value database?
- literally just look up the value based on a key
- all data for a key value database is in the machines memory instead of on disk
- pros: very fast
- cons: pretty limited in what it can do; queries are not possible
- often used as a cache
- examples are redis, memcached