System Design Flashcards
what are the two types of requirements you want to collect at the start of a system design interview?
functional and non-functional requirements
what are functional requirements?
Specific functionalities that bring the user to the service. These specific functions are directly adjacent to providing the key service
what are non-functional requirements?
Functionalities that affect the overall operation of the system but are broader system functions that aren’t adjacent to specific functions of a service. Examples include scalability, reliability, usability, security, and performance
why is it important that application servers are stateless?
so that any server can handle any request for load balancing and failover and resilience
is mongodb a sql or nosql db?
nosql
in mongodb, what happens when a primary shard node goes down?
the secondary nodes will automatically elect a new leader
in mongodb, how does the mongos service know which shard a data request belongs to?
look up in the config service
how does cassandra trade of consistency to get more availability?
because any node can function as the primary read or write node, you are more available, but it might take time for the data to propagate to the other nodes so you have consistency
what is the hotspot problem in nosql databases?
aka the celebrity problem, if one shard gets a lot more traffic due to usage patters. many modern systems can reshard based on traffic patterns to avoid this
how can you design your data schema to make it easy to scale horizontally using nosql?
think in terms of simple key-value lookups. maybe break complex joins into a couple of simple key-value lookups. thats much easier to shard
what does it mean to say that data is normalized?
data is stored in logical tables based on entity and refer to each other via foreign keys. data is not duplicated all over the place. for exampe, a dinner reservation has a customer_id, which points to the customer table to a row with that id
what are the advantages of having normalised data?
there is less (or no) data duplication, saving space, and you can update data in a single place and have that reflected everywhere
why might you choose to denormalise your data?
it’s more efficient and performant. you get all the data you need with a single lookup, instead of having to do multiple lookups to join all the data. the cost you pay is data duplication
what are the downsides of demormalised data?
it costs extra space due to data duplication, and updating data requires you to update in multiple locations which costs time and eventual consistency
can you have normalised data in a nosql database?
yes, it just means that you would need to do multiple simple lookups. for example, looking up a reservation, and then looking up the customer_id in that reservation to the get the customer data
would you generally start with normalised or denormalised data? when might you switch
normalised, because it’s simpler, saves space, and is more consistent. when performance becomes a bottleneck, you might denormalise to need to do fewer lookups and speed things up.
if you design an LRU cache, what two data structures might you use under the hood?
a hashmap for finding a kv-pair in constant time, as well as a doubly-linked-list to move read item to the front of the list, so you can always delete the last item in the list.
what are three common cache eviction strategies?
LRU, LFU, FIFO
what is LRU?
least recently used - a cache eviction strategy
what is LFU?
least frequently used - a cache eviction strategy
what are some popular caching solutions?
memcached, redis, elasticache, ncache, ehcache
what is memchached
simple in-memory key value store. nothing fancy like redis has.
what does CDN stand for?
Content Delivery Network
dns based geo-routing of traffic to the correct region
how can you prevent cascading failures?
overprovisioning - ensure that all the other systems can handle the load of one system when it fails and traffic gets redistributed
how many minutes of downtime per year is five nines of availability?
about five (5.256)
what is an SLA?
a Service Level Agreement. What your users can expect from you. For example, five nines of uptiome, or p95 sub-second latency
what is HDFS?
hadoop distributed file system - an open source self managed distributed file system for big data storage
Distributed “NoSQL” databases with a master node that distributes transactions fall on which side of the CAP triangle?
Single-master designs favor consistency and partition tolerance. Although in principle availability it what’s given up, in practice modern NoSQL databases have highly redundant master nodes that can quickly replace themselves in the event of failure.
In HDFS, the server responsible for coordinating requests is called the:
In the Hadoop Distributed File System, the name node coordinates how files are broken into blocks, and where those blocks are stored. In high availability settings, multiple name nodes may be present for failover.
advantage of a linked list over an array?
it can grow dynamically. an array needs to be resized because it is stored sequentially in memory.
what’s the order of complexity of accessing an item in a linked list?
O(n)
what kinds of data structures can you implement using a linked list?
stacks and queues
whats the difference between a singly and doubly linked list?
in a doubly linked list you have pointers going in both directions, and you keep track of both head and tail, so you can move in either direction