Systems Design Flashcards
Be able to talk about trade-offs for your design
yeah
difference between strong vs eventual consistency, when would you use either
https://hackernoon.com/eventual-vs-strong-consistency-in-distributed-databases-282fdad37cf7
strong consistency means if you make a write request to a db, all nodes of that db are updated before any read request can be processed.
You would want this in a case where information must be accurate each time you get it, ticket reservations might be a good use case.
Trade-offs: This increases latency because every write request requires all nodes to be updated from that write request.
eventual consistency is when the data will eventually be identical across all nodes. If I make a write request to one node, the other nodes can still be read from and the data may be inconsistent.
Banks (checks, charge backs, refunds), online store inventory, social media all use eventual consistency to improve performance. It works because of how much access we have to the internet that things can be constantly updated with ease.
You want this if you need very low latency, but the data does not always need to be accurate. Likes on an instagram post, certain banking transactions, instragram likes
Load balancing, what is it? How would you achieve load balancing a url hosted by many machines with different IP’s?
Load balancing is the act of distributing communication or work across multiple different pieces of hardware
For a DNS/url, if a user requests the name, google.com, it would return the load balancer url and the load balancer would decide which machine to route the traffic to. This lets us use only 1 public ipv4 address, which is hard to come by, and keep our machines private from online attackers
How do we decide how to route the traffic once the lb gets it? That can be many factors, 1 could be round-robin, just go in a circle and everyone gets equal traffic. A better way would be to route traffic to the server with the least current load based on a metric. This metric could be different for many systems, reading large files would require more RAM vs computing large numbers would need more CPU time
cpu vs io bound operations, what is the difference, examples of each
https://www.brainscape.com/flashcards/leetcode-heaps-10231570/packs/18112311
cpu bound: a thread of execution is cpu bound if its performance is correlated with the cpu
This means a cpu bound process will increase or decrease performance based on the cpu strength
IO bound: a thread of execution is IO bound if its performance is correlated to the subsystem, peripheral systems, or something that the cpu does not control
This includes reading/writing to harddrive or waiting for responses from your network. If you are writing to an SSD vs a spinning HD, the SSD will be faster and the cpu will have little to no effect on its performance. If you are calling an external API, if the API is slow, there is nothing you can do to increase the external server’s response time other than caching and other methods that would have to be done locally
How do you determine if a system is IO bound vs CPU bound?
monitoring the task manager, you can see that a cpu bound program will have high CPU time and low idle time, vs an IO bound operation will have high idle time and almost no CPU time
If you were designing a scheduler, how would you give different priority to different programs if you knew each program coming in was more CPU or IO bound?
https://www.geeksforgeeks.org/cpu-scheduling-in-operating-systems/
OS have these goals in mind:
Max CPU utilization [Keep CPU as busy as possible]
Fair allocation of CPU.
Max throughput [Number of processes that complete their execution per time unit]
Min turnaround time [Time taken by a process to finish execution]
Min waiting time [Time a process waits in ready queue]
Min response time [Time when a process produces first response]
Question that helps answer this question: Why is it important for the scheduler to distinguish I/O-bound programs from CPU-bound programs?
Answer: I/O-bound programs have the property of performing only
a small amount of computation before performing IO. Such programs
typically do not use up their entire CPU quantum. CPU-bound programs, on the other hand, use their entire quantum without performing any
blocking IO operations. Consequently, one could make better use of the
computer’s resources by giving higher priority to I/O-bound programs
and allow them to execute ahead of the CPU-bound programs
If you had a CPU that could handle 16 threads and you knew all programs run on it would be 100% CPU bound, nothing is IO bound, how many execution threads would you let your sytem run?
One thread, because threading takes up CPU time and if each program is CPU bound adding threads will decrease the performance
How does a scheduler determine if a thread is more IO or CPU bound?
https://unix.stackexchange.com/questions/254864/how-does-the-linux-scheduler-determine-if-a-process-is-i-o-bound-or-cpu-bound
A scheduler knows this by looking at whether or not the thread released its CPU time slice early or it used its entire time. If a process uses the entire CPU time slice, it is likely CPU bound compared to when a process releases its CPU time slice early; in the ladder case, it is probably waiting on an IO operation, hence IO bound
You are given 100 big objects that are unrelated to each other. Each requires you to perform a lot of complicated math, then store it on your computer in different locations. You have a dual-core processor that can handle 8 threads. How do you design the program?
I would have 1 thread per core perform the CPU bound math task, then the other 3 threads would be responsible for storing the data on the CPU in different locations. If the math tasks are not 100% CPU bound, then it would be a 2/2 split
How would you design an efficient database?
IDFK!!!
When you type in a url into the browser, how is that converted to an IP? if your DNS routes to multiple servers and is horizontally scaled, what problems could that cause? What services could you use to provide load balancing? What do we call the act of preserving a user session across multiple servers when we scale? Talk about maintain redundancy while horizontally scaling
A DNS server manages to convert a website into an IP and send it. When you search the url, you are typically given a list of servers that you specifically are going to use. These servers are cached in some way so you do not have to repeatedly get the server IP’s.
This could cause a problem bc if your session data for a website is stored locally on that server, then when you get a different server with DNS now you do not have your session data and you have to login again. At worst, if you are shopping and you were shopping on 3 different servers, each server will have different parts of your cart and no way to combine them when you checkout
A solution to that would be either to route a specific user to the same server each time or we could have user data abstracted from the specific servers and onto an external hard drive so all user data is consistent across all servers.
Introducing a single hard drive only causes a different problem. We started with 1 server, added redundancy to reduce issues, now we are back to only have 1 hard drive. To solve this we would use RAID which basically means mirroring storage across multiple hard drives.
When we preserve user data across multiple servers so they do not have to login again and such, we call those sticky sessions
You can use software as a load balancer through AWS elastic load balancer, HA proxy, and other open source softwares or you can purchase hardware from companies that performs the task
How would you add redundancy to a database?
You would add database replication. You have a master/slave relation, where if anything happens to the master it is copied to all the slaves, they are all identical copies.
This is good for a read-heavy site because any of the masters or slaves can be queried for data which speeds up that process heavily. Writes will have to go to the master node
To follow this up, you can have a master-master-slave relation with multiple masters to add redundancy; when 1 master gets a write it is copied to the other master
Talk about caching, different mechanisms to achieve it, different use cases
In regards to sticky sessions and maintaining it, a good way to ensure the user goes back to the same server is to use cookies, encode a string, the DNS/load balancer decodes it and send it to the designated server. This obfuscates the server’s private IP. If the server IP changes, the DNS/load balancer’s decoding will be updating as well.
Craig’s list uses pure html files to statically serve their posts instead of dynamically hosting them like many other sites. They do this because it is very easy to read bytes from disk, so their website can load very efficiently. The problem is if they have 100k posts and they want to change anything about the site, they have to change that in all 100k of those locations vs changing 1 location for dynamic websites. This can be considered server-side caching bc the post is written once and saved, then everyone else reads that exact same cached html file
Redis vs Memecache
https://stackoverflow.com/questions/10558465/memcached-vs-redis
Both are in-memory data stores that work as caches and can help speed up databases or anything that might be expensive to repeatidly generate
Redis is more popular and more powerful because it offers more features that memecahe. Memcache only supports strings while Redis can support all different data types and it has cluster support and persistence support when it loses power.
memcache is a great solution that stands for memory cache. You can say it is 1-tier above a MySQL database, some data is in RAM, but most of the data is searchable from a pkey or unique keys. Memcache is a server that is always running which stores data in RAM as key-value-pairs. One a key is searched, if it is not in the cache, it is added. Once the cache runs out, it performs a FIFO operation where the “first-in” is related to the index that was the last used and it removes that index and replaces it with the new one.
what is database partitioning?
This is when you partition a database to have certain data on different servers. FB did this early on when they partitioned MIT vs Hardvard users on different servers bc their 1 server was not large enough. You can also partition the users based on last name or some other data point
what does high availability mean in the context of a resource? (database, load balancer, server, ect)
https://docs.oracle.com/cd/B19306_01/server.102/b14210/overview.htm
HA or high availability means that each resource is constantly checking the status of the other ones in the cluster to make sure they are available; if one of them fails, it is quickly detected and remedied
Four characteristics of a HA system are:
- Reliability: they are robust and do not easily fail
- Recoverability: When a failure occurs, what do you do? You should know the errors that can occur ahead of time and plan for those errors accordingly so you can act fast. If a critical database table is deleted or access is lost, how do you plan to recover the table and give availability back to your customers?
- Timely error detection: If a component fails, the time it takes to resolve the failure includes detection + resolution time. Reducing the detection time is key in achieving a highly available system because the sooner you detect it, the sooner you solve it
- Continuous operations: Users are always able to access your systems and there is little to no downtime to perform maintenance or updates. Actions such as moving database tables to different locations (physical or otherwise), updating tables, adding cpu power or other hardware, should all be transparent to the user
List types of traffic and the respective ports that coincide with them
TCP (transmission control protocol), 80
SSL (secure socket layer) used for HTTPS, 443
SSH 22
When should an operation be asynchronous vs synchronous?
idfk
Clarifying questions to ask when you begin a systems design interview
Who is going to use this? (students, professionals, children)
How are they going to use this? (app, web, on the go, etc)
How many users do we expect to handle at any given time
What is the goal of the system
What are the inputs/outputs of the system
How much data will we have to handle per user
How many requests per second do we need to handle
What is the expected read:write ratio
Name some hashing algorithms, explain their uses and such
The goal of a hashing function is for file transfer. If I send a file from A to B, I can check the hash of the algorithm to see if it matches the original to make sure there has been no tampering
Public hash keys are for encrypting data, while private is for decrypting and confirming integrity of file
Hashing algos need to be quick enough for people to use them, but slow enough that you cant easily iterate through the algo and create targeted hash values, if you can create a target hash value you can cause collisions
hash collisions are when you have 2 different files evaluating to the same hash value, this is bad for security reasons bc now you cannot be sure of the integrity of the file
hash algorithms are supposed to protect against this by using an “avalanche” function, when you change 1 bit of the original file, the hashed value is completely different
MD5 hash is an algorithm, it takes in data and hashes it to a 128-bit signed value that “ideally” is completely unique and cannot be recreated. MD5 today is broken and can be tampered with.
SHA-1, SHA-2, SHA-3 are the new hashing functions that are safe to use
ACID-compliant databases
Atomic: a transaction in the database is all or nothing, if part of the write fails, the entire action will fail, you will not get partially run writes
Consistent: 2 people making a query on unchanged data will get the same result. 2 queries that follow or violate rules will be allowed or disallowed to complete
Isolated: the ability to process multiple transactions at once as long as they do not affect one another
Durable: when your technology fails, such as a power outage or software crash, the data stays intact. We do not lose data when failures happen