Neetcode System Design Concepts Flashcards
How can we vertically scale a single server?
Give it more RAM or CPU (memory or raw computing power)
Why is vertically scaling a server limited?
You’ll run out of computing power fast. Won’t be able to keep up with compute demands
Why is vertically scaling a single server risky?
It creates a single point of failure if anything happens to that server. All our eggs are in one basket
What is the advantage of horizontal scaling vs. vertical scaling?
Horizontal scaling allows us to create replicas of a single server which eliminates our single point of failure problem and allows us to get a lot of computing done with average servers. Don’t need really high power servers to make this effective
True or False: Horizontal scaling can scale infinitely in theory
True
Does a horizontally scaled system have better availability than a vertically scaled one? Why or why not?
Yes - a horizontally scaled system allows us to eliminate the single point of failure issue and allows us to route traffic to different servers depending on the health of a server
What is a disadvantage to horizontally scaling servers?
It’s much more complicated than vertically scaling. You have to make sure one server doesn’t get overloaded or many servers don’t sit idle. You also have to use load balancers to balance out the load, etc.
What is another name for a load balancer?
Reverse-proxy
How does a load balancer work from a high-level?
It takes in incoming requests and redirects them to the correct server
What are two methods for redirecting traffic in a load balancer?
Round robin & hashing
Let’s say we have a global infrastructure, how can we use load balancers to our advantage from a geographic standpoint?
We can use load balancers to route traffic to the nearest location or region
If we have a global system, how can we deliver static content like images, HTML, CSS?
We can use content delivery networks (CDN). This allows us to configure how we deliver static content across our network
What is a content delivery network?
It’s a network of servers that are located all around the world. These servers can delivery static content like images, videos, HTML/CSS/JS
How do CDN’s take files from our server and put them onto the CDN?
It takes the file from our origin server and copies them onto CDN servers
True or False: CDN’s can copy data on both a push or pull basis
True - this can be done push or pull
What is a general definition of caching?
Creating copies of data so that it can be refetched faster in the future
What is the advantage of caching on a machine?
Things like network requests can be expensive time-wise (take a long time) so we can cache this data to disk to reduce that time burden
Is reading disk expensive?
Yes, we can copy to memory but that can be expensive too, so usually our operating systems will copy this into a subset of our CPU (L1, L2, L3 CPU cache)
What is an IP address?
Every computer is assigned an IP address, which is a unique identifier for any machine. You can think of this as the telephone number for any machine
What does the Internet Protocol Suite include?
IP/TCP & UDP
What is the purpose of TCP?
Sending data over a network has to have a set of rules, TCP enforces these rules
True or False: TCP can fail because it doesn’t resend a request that fails
False - TCP will automatically re-send any requests that fails
When you type in a URL, how does the computer know which IP address it belongs to?
DNS (domain name system) solves this. You create an “a” record that points a URL to an IP address of the server
Does the operating system have to make a request for the IP address of the server using DNS every time?
No this request is usually cached by the operating system so that it can be referenced later without making the request every time
What is HTTP in relation to TCP?
It’s built on top of TCP - TCP is too low-level to be useful for network requests so HTTP was built to handle these
What is a general definition of HTTP?
HTTP provides an application-layer protocol which follows the client-server model instead of just using packets like TCP
What is the general structure of an HTTP request from the client?
Client will initialize the request with a request header (where it’s going, who it’s from) and a request body (payload or the actual content)
What is a REST API?
It’s a standardized HTTP API that makes these requests stateless and consistent
What is a 200, 400, and 500 error code in REST API?
200 = successful, 400 = unsucessful request, 500 = internal server error
What advantages does GraphQL have over REST API?
You can make a request for specific fields so that you don’t overfetch data or make duplicate requests
What is gRPC?
Released by Google in 2016, gRPC is a client to server interaction that creates a performance boost from protocol buffers
What is a protocol buffer in a gRPC system?
Improved version of JSON using serialized binaries to send data which is much faster than JSON. Down-side is that it’s not as human readable as JSON
True or False: WebSockets are built on top of TCP
True
What is the main advantage of WebSockets?
You can get the change instantly from device to device. Once a message is received, it is immediately sent to the next device
If we tried to replicate WebSockets using HTTP, how could be do it?
We’d have to use polling to do this using HTTP which works but is sub-optimal because you have to continually check for changes using polling
What is SQL mostly used for on a high-level?
Storing data
What are the most common SQL tools?
MySQL, Postgres
If we can just store data in a text file on disk, why do we need to use a database?
Databases can more efficiently store data using data structures like B-trees
Databases also have fast retrieval of data using SQL queries
SQL queries allow you to access data that are stored as rows in tables
True or False: One issue with SQL is that it is not ACID compliant
False - SQL is ACID compliant
What does the acronym ACID stand for?
Atomicity, Consistency, Isolation, Durability
What does atomicity mean in the ACID acronym?
Every transaction is all or nothing
What does consistency mean in the ACID acronym?
Foreign keys and other constraints will always be enforced
What does isolation mean in the ACID acryonym?
Different concurrent transactions won’t interfere with each other (think a queue)
What does durability mean in the ACID acronym?
Data is stored on disk, so if a machine is restarted data will still be there
What is the main advantage of using NoSQL databases?
Consistency (using foreign keys) makes databases harder to scale, so NoSQL databases remove this relation constraint
What are 3 different types of NoSQL databases?
Key-value stores (DynamoDB), Document stores (MongoDB), Graph DB (neo4j)
If we’re separating the database using sharding, how do we decide which portion of the data to put on which machine?
We can use a shard key
What is the definition of sharding in relation to databases?
Since we no longer have foreign key constraints, we can break up our database and scale horizontally, this is called sharding
What is ranged-based sharding?
Ranged-based sharding is using the id of a person in a table as the key
What is the advantage to replication over sharding?
Replication is a simpler approach to sharding
What is leader-follower replication?
If we want to scale our DB reads, we can make read only copies of our DB - this is leader-follower replication
What is the process for leader-follower replication?
Every write gets sent to the leader, who then sends to the followers
Every read could go to a leader or a follower
True or False: Leader-leader replication is possible
True - every replica can be read or write but this can lead to inconsistent data
What does the acryonym CAP mean in CAP theorem?
Consistency, Availability, Partition (Network)
What is the advantage of CAP theorem? What problem does it solve?
It can be complex to keep replicas in sync, so CAP theorem was created to weigh trade-offs with replicated design
What is the technical definition of CAP theorem?
Given a network partition in a database, you have to choose between data consistency or data availability
How are message queues similar to databases?
They are similar to databases in that they have durable storage, can be replicated for redundancy, or sharded for scalability
If we’re overwhelmed with more data than we can process, how can message queues help us solve this problem?
Message queues are perfect for this because we can handle these requests one at a time in a consistent manner. Data can be persisted (held in queue) before it can be processed
True or False: Different parts of our app can be decoupled by using message queues
True