Systems Flashcards
What are some common bottlenecks scaling up a web service?
Scaling the Database
CPU Bound Application
Architecture Bottlenecks
IO Bound Application
What is a REST API?
REST, or REpresentational State Transfer, is an architectural style for providing standards between computer systems on the web, making it easier for systems to communicate with each other. REST-compliant systems, often called RESTful systems, are characterized by how they are stateless and separate the concerns of client and server.
Trade-offs to consider regarding storage
Storage is about holding information. Any app, system, or service that you program will need to store and retrieve data, and those are the two fundamental purposes of storage.
- the shape (structure) of your data
- what sort of availability it needs (what level of downtime is OK for your storage)
- scalability (how fast do you need to read and write data, and will these reads and writes happen concurrently (simultaneously) or sequentially) etc, or
- consistency - if you protect against downtime using distributed storage, then how consistent is the data across your stores?
Define latency
Latency is simply the measure of a duration. What duration? The duration for an action to complete something or produce a result. For example: for data to move from one place in the system to another. You may think of it as a lag, or just simply the time taken to complete an operation.
Define throughput
This can be understood as the maximum capacity of a machine or system. It’s often used in factories to calculate how much work an assembly line can do in an hour or a day, or some other unit of time measurement.
What are SLAs?
Service Level Agreements/Assurances
In order to make online services competitive and meet the market’s expectations, online service providers typically offer Service Level Agreements/Assurances. These are a set of guaranteed service level metrics. 99.999% uptime is one such metric and is often offered as part of premium subscriptions.
How to design a high availability system?
When designing a high availability (HA) system, then, you need to reduce or eliminate “single points of failure”. A single point of failure is an element in the system that is the sole element that can produce that undesirable loss of availability.
You eliminate single points of failure by designing ‘redundancy’ into the system. Redundancy is basically making 1 or more alternatives (i.e. backups) to the element that is critical for high availability.
What are relational databases?
A relational database is one that has strictly enforced relationships between things stored in the database. These relationships are typically made possible by requiring the database to represented each such thing (called the “entity”) as a structured table - with zero or more rows (“records”, “entries”) and and one or more columns (“attributes, “fields”).
By forcing such a structure on an entity, we can ensure that each item/entry/record has the right data to go with it. It makes for better consistency and the ability to make tight relationships between the entities.
What are ACID transactions?
ACID transactions are a set of features that describe the transactions that a good relational database will support. ACID = “Atomic, Consistent, Isolation, Durable”. A transaction is an interaction with a database, typically read or write operations.
What does the A in ACID stand for?
Atomicity requires that when a single transaction comprises of more than one operation, then the database must guarantee that if one operation fails the entire transaction (all operations) also fail. It’s “all or nothing”. That way if the transaction succeeds, then on completion you know that all the sub-operations completed successfully, and if an operation fails, then you know that all the operations that went with it failed.
What does the C in ACID stand for?
Consistency requires that each transaction in a database is valid according to the database’s defined rules, and when the database changes state (some information has changed), such change is valid and does not corrupt the data. Each transaction moves the database from one valid state to another valid state. Consistency can be thought of as the following: every “read” operation receives the most recent “write” operation results.
What does the I in ACID stand for?
Isolation means that you can “concurrently” (at the same time) run multiple transactions on a database, but the database will end up with a state that looks as though each operation had been run serially ( in a sequence, like a queue of operations). I personally think “Isolation” is not a very descriptive term for the concept, but I guess ACCD is less easy to say than ACID.
What does the D in ACID stand for?
Durability is the promise that once the data is stored in the database, it will remain so. It will be “persistent” - stored on disk and not in “memory”.
What are non relational databases?
In contrast, a non-relational database has a less rigid, or, put another way, a more flexible structure to its data. The data typically is presented as “key-value” pairs.
NoSQL database properties are sometimes referred to as BASE:
Basically Available which states that the system guarantees availability
Soft State means the state of the system may change over time, even without input
Eventual Consistency states that the system will become consistent over a (very short) period of time unless other inputs are received.
What is replication?
Replication means to duplicate (make copies of, replicate) your database.
We had considered the benefits of having redundancy in a system to maintain high availability. Replication ensures redundancy in the database if one goes down. But it also raises the question of how to synchronize data across the replicas, since they’re meant to have the same data. Replication on write and update operations to a database can happen synchronously (at the same time as the changes to the main database) or asynchronously .
What is sharding?
Sharding data breaks your huge database into smaller databases. You can work out how you want to shard your data depending on its structure. It could be as simple as every 5 million rows are saved in a different shard, or go for other strategies that best fit your data, needs and locations served.
What is polling?
Polling is simply having your client check on a server by sending it a network request and asking for updated data. These requests are typically made at regular intervals like 5 seconds, 15 seconds, 1 minute or any other interval required by your use case.
What is pubsub messaging?
The key concept is that publishers ‘publish’ a message and a subscriber subscribes to messages. To give greater granularity, messages can belong to a certain “topic” which is like a category. These topics are like dedicated “channels” or pipes, where each pipe exclusives handles messages belonging to a specific topic. Subscribers choose which topic they want to subscribe to and get notified of messages in that topic. The advantage of this system is that the publisher and the subscriber can be completely de-coupled - i.e. they don’t need to know about each other. The publisher announces, and the subscriber listens for announcements for topics that it is on the lookout for.
A server is often the publisher of messages and there are usually several topics (channels) that get published to. The consumer of a specific topic subscribes to those topics. There is no direct communication between the server (publisher) and the subscriber (could be another server). The only interaction is between publisher and topic, and topic and subscriber.
Steps for system design interview
Step 1: Outline use cases, constraints, and assumptions
Step 2: Create a high level design
Step 3: Design core components
Step 4: Scale the design