System Design Concepts Flashcards
What are the steps involved in a system design interview?
- Functional Requirements (APIs, use case)
- Non Functional Requirements (traits of the system)
- Basic Architecture
- Data Modeling/Data Diagram
- Scaling/Different Pieces
- Follow Up
What does it mean for a system to be scalable
A system is scalable if it results in increased performance in a manner proportional to resources added
What’s the difference between latency and throughput
Latency is the time to perform some action or to produce some result, throughput is the number of such actions or results per unit time.
What do you want to optimize with regards to latency and throughput
Maximum throughput, acceptable latency
Explain the CAP theorem
C = Consistency A = Availability P = Partition Tolerance
A distributed system can only have any 2 of CAP at a given time.
Explain CP
Consistency and Partition Tolerance
Waiting for a response from the partitioned node might result in timeout error - this is good if your business needs atomic reads and writes
Explain AP
Availability and Partition Tolerance
Responses return the most readily available version of data on any node, which may not be the latest. This is good if you allow eventual consistency
Explain weak consistency and applications
After a write, reads may or may not see it. A best effort approach is taken. This is seen in systems such as memcached, or real time use cases such as video chat, or multiplayer games.
Explain eventual consistency
After a write, reads will eventually see it (typically within a couple ms) - data is replicated asynchronously. This is seen in e.g. email and DNS; this works well in highly available systems
Explain strong consistency
After a write, reads will see it. Data is replicated synchronously. This approach is seen in RDBMS (relational databases), and in systems which need transactions.
What are two database patterns to support high availability?
Fail over and replication
What is fail-over and two types?
Active-passive, active-active
Active-passive fail over is when the passive server stops receiving heartbeats and takes over as active server. (master-slave failover)
Active active fail over is when both servers are managing traffic and spreading the load between them - DNS would need to know about the public ips of both servers (or application logic if internal facing)
There is risk in that there is a potential loss of data if the active system fails before any newly written data is replicated to the passive.
Writes get replayed to read replicas, which increases latency of the reads
Replication lag occurs if you have more and more read slaves/followers
What is master slave replication? (or, leader/follower replication)
Master will serve reads and writes, replicating to one or more slaves, which only serve reads; if a master goes offline, the system can continue in read only mode until a slave instance becomes promoted to master
What is master master replication (or, leader/leader replication)
Both masters serve reads and writes and coordinate with each other on writes. If either master goes down, the system can continue to operate with both read and writes.
Disadvantages: You'll need a load balancer or make changes to application layer on where to write Loose consistency (violating ACID) or increased write latency due to synchronization Conflict resolution comes into play as more write nodes are added
Explain the letters in the ACID acronym
A = atomicity; each transaction is all or nothing C = consistency; any transaction will bring the database from one valid state to another I = isolation; whether executed sequentially or in parallel, results after said writes will be the same D = durability; once a transaction has been committed, it will remain as so
What are advantages and disadvantages of sharding
Advantages:
- less read and write traffic
- less replication
- more cache hits
- smaller index size
- one shard going down means other shards are operational
Disadvantages:
- can result in complex SQL queries
- data distribution can be lopsided, depending on shard index (resulting in increased load)
- joining data from shards is complex
What is DNS and advantages/disadvantages
Domain Name Service
Translates a domain name to an IP address
DNS products (Cloudflare, Route53) can route traffic through various different methods
- prevent traffic from hitting servers under maintenance
- A/B testing
- location based
DNS has some disadvantages in that
- there is a slight delay, which could be mitigated by caching
- usually managed by governments, lSPs and large companies
- recently, DNS services have come under DDos attack
What is a CDN, advantages and disadvantages
Content Delivery Network
Proxy servers closer to the user, which help serve static files such as HTML/CSS/JS (DNS resolution determines which server to contact)
Advantages:
- users receive content from data centers close to them
- your servers do not have to serve requests that the CDN fills
Disadvantages:
- CDN costs could be significant due to traffic
- Content might be stale if it is updated before the TTL expires it
- CDNs require changing URLs for static content pointed to the CDN
What is the difference between a push CDN and a pull CDN
Push CDNs receive new content whenever changes occur on the server. You take full responsibility for providing content, uploading directly to the CDN, and rewriting URLs pointed to the CDN. You can configure when content expires and when it’s updated.
Works well with sites which don’t have a lot of traffic or content that isn’t updated often.
Pull CDNs grab content from your server when the first user requests the content (so first request is not sped up). You leave the content on your server and rewrite URLs to point to the CDN. This results in a slower request until the content is cached on the CDN.
A time-to-live TTL determines how long content is cached. Pull CDNs minimize storage space on the CDN, but can create redundant traffic if files expire and are pulled before they are changed.
Sites with heavy traffic work well with pull CDNs as traffic is spread out more evenly with only recently-requested content remaining on CDN
What is a reverse proxy
A reverse proxy is a web server that centralizes internal services and provides unified interfaces to the public.