Systems Flashcards

1
Q

What are some common bottlenecks scaling up a web service?

A

Scaling the Database

CPU Bound Application

Architecture Bottlenecks

IO Bound Application

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a REST API?

A

REST, or REpresentational State Transfer, is an architectural style for providing standards between computer systems on the web, making it easier for systems to communicate with each other. REST-compliant systems, often called RESTful systems, are characterized by how they are stateless and separate the concerns of client and server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Trade-offs to consider regarding storage

A

Storage is about holding information. Any app, system, or service that you program will need to store and retrieve data, and those are the two fundamental purposes of storage.

  • the shape (structure) of your data
  • what sort of availability it needs (what level of downtime is OK for your storage)
  • scalability (how fast do you need to read and write data, and will these reads and writes happen concurrently (simultaneously) or sequentially) etc, or
  • consistency - if you protect against downtime using distributed storage, then how consistent is the data across your stores?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define latency

A

Latency is simply the measure of a duration. What duration? The duration for an action to complete something or produce a result. For example: for data to move from one place in the system to another. You may think of it as a lag, or just simply the time taken to complete an operation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define throughput

A

This can be understood as the maximum capacity of a machine or system. It’s often used in factories to calculate how much work an assembly line can do in an hour or a day, or some other unit of time measurement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are SLAs?

A

Service Level Agreements/Assurances

In order to make online services competitive and meet the market’s expectations, online service providers typically offer Service Level Agreements/Assurances. These are a set of guaranteed service level metrics. 99.999% uptime is one such metric and is often offered as part of premium subscriptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to design a high availability system?

A

When designing a high availability (HA) system, then, you need to reduce or eliminate “single points of failure”. A single point of failure is an element in the system that is the sole element that can produce that undesirable loss of availability.

You eliminate single points of failure by designing ‘redundancy’ into the system. Redundancy is basically making 1 or more alternatives (i.e. backups) to the element that is critical for high availability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are relational databases?

A

A relational database is one that has strictly enforced relationships between things stored in the database. These relationships are typically made possible by requiring the database to represented each such thing (called the “entity”) as a structured table - with zero or more rows (“records”, “entries”) and and one or more columns (“attributes, “fields”).

By forcing such a structure on an entity, we can ensure that each item/entry/record has the right data to go with it. It makes for better consistency and the ability to make tight relationships between the entities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are ACID transactions?

A

ACID transactions are a set of features that describe the transactions that a good relational database will support. ACID = “Atomic, Consistent, Isolation, Durable”. A transaction is an interaction with a database, typically read or write operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the A in ACID stand for?

A

Atomicity requires that when a single transaction comprises of more than one operation, then the database must guarantee that if one operation fails the entire transaction (all operations) also fail. It’s “all or nothing”. That way if the transaction succeeds, then on completion you know that all the sub-operations completed successfully, and if an operation fails, then you know that all the operations that went with it failed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the C in ACID stand for?

A

Consistency requires that each transaction in a database is valid according to the database’s defined rules, and when the database changes state (some information has changed), such change is valid and does not corrupt the data. Each transaction moves the database from one valid state to another valid state. Consistency can be thought of as the following: every “read” operation receives the most recent “write” operation results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the I in ACID stand for?

A

Isolation means that you can “concurrently” (at the same time) run multiple transactions on a database, but the database will end up with a state that looks as though each operation had been run serially ( in a sequence, like a queue of operations). I personally think “Isolation” is not a very descriptive term for the concept, but I guess ACCD is less easy to say than ACID.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the D in ACID stand for?

A

Durability is the promise that once the data is stored in the database, it will remain so. It will be “persistent” - stored on disk and not in “memory”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are non relational databases?

A

In contrast, a non-relational database has a less rigid, or, put another way, a more flexible structure to its data. The data typically is presented as “key-value” pairs.

NoSQL database properties are sometimes referred to as BASE:

Basically Available which states that the system guarantees availability

Soft State means the state of the system may change over time, even without input

Eventual Consistency states that the system will become consistent over a (very short) period of time unless other inputs are received.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is replication?

A

Replication means to duplicate (make copies of, replicate) your database.

We had considered the benefits of having redundancy in a system to maintain high availability. Replication ensures redundancy in the database if one goes down. But it also raises the question of how to synchronize data across the replicas, since they’re meant to have the same data. Replication on write and update operations to a database can happen synchronously (at the same time as the changes to the main database) or asynchronously .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is sharding?

A

Sharding data breaks your huge database into smaller databases. You can work out how you want to shard your data depending on its structure. It could be as simple as every 5 million rows are saved in a different shard, or go for other strategies that best fit your data, needs and locations served.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is polling?

A

Polling is simply having your client check on a server by sending it a network request and asking for updated data. These requests are typically made at regular intervals like 5 seconds, 15 seconds, 1 minute or any other interval required by your use case.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is pubsub messaging?

A

The key concept is that publishers ‘publish’ a message and a subscriber subscribes to messages. To give greater granularity, messages can belong to a certain “topic” which is like a category. These topics are like dedicated “channels” or pipes, where each pipe exclusives handles messages belonging to a specific topic. Subscribers choose which topic they want to subscribe to and get notified of messages in that topic. The advantage of this system is that the publisher and the subscriber can be completely de-coupled - i.e. they don’t need to know about each other. The publisher announces, and the subscriber listens for announcements for topics that it is on the lookout for.

A server is often the publisher of messages and there are usually several topics (channels) that get published to. The consumer of a specific topic subscribes to those topics. There is no direct communication between the server (publisher) and the subscriber (could be another server). The only interaction is between publisher and topic, and topic and subscriber.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Steps for system design interview

A

Step 1: Outline use cases, constraints, and assumptions
Step 2: Create a high level design
Step 3: Design core components
Step 4: Scale the design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What questions would you ask to outline the use case?

A

Who is going to use it?
How are they going to use it?
How many users are there?
What does the system do?
What are the inputs and outputs of the system?
How much data do we expect to handle?
How many requests per second do we expect?
What is the expected read to write ratio?

21
Q

What should you consider when scaling an application?

A

Load balancer
Horizontal scaling
Caching
Database sharding

22
Q

SQL vs NoSQL - reasons for SQL

A
Structured data
    Strict schema
    Relational data
    Need for complex joins
    Transactions
    Clear patterns for scaling
    More established: developers, community, code, tools, etc
    Lookups by index are very fast
23
Q

SQL vs NoSQL - reasons for NoSQL

A
Semi-structured data
    Dynamic or flexible schema
    Non-relational data
    No need for complex joins
    Store many TB (or PB) of data
    Very data intensive workload
    Very high throughput for IOPS
24
Q

What are message queues?

A

Message queues receive, hold, and deliver messages. If an operation is too slow to perform inline, you can use a message queue with the following workflow:

An application publishes a job to the queue, then notifies the user of job status
A worker picks up the job from the queue, processes it, then signals the job is complete

The user is not blocked and the job is processed in the background. During this time, the client might optionally do a small amount of processing to make it seem like the task has completed. For example, if posting a tweet, the tweet could be instantly posted to your timeline, but it could take some time before your tweet is actually delivered to all of your followers.

Redis is useful as a simple message broker but messages can be lost.

RabbitMQ is popular but requires you to adapt to the ‘AMQP’ protocol and manage your own nodes.

Amazon SQS is hosted but can have high latency and has the possibility of messages being delivered twice.

25
Q

Performance vs scalability

A

A service is scalable if it results in increased performance in a manner proportional to resources added. Generally, increasing performance means serving more units of work, but it can also be to handle larger units of work, such as when datasets grow.

Another way to look at performance vs scalability:

If you have a performance problem, your system is slow for a single user.
If you have a scalability problem, your system is fast for a single user but slow under heavy load.
26
Q

Latency vs throughput

A

Latency is the time to perform some action or to produce some result.

Throughput is the number of such actions or results per unit of time.

Generally, you should aim for maximal throughput with acceptable latency.

27
Q

Availability vs consistency

A

In a distributed computer system, you can only support two of the following guarantees:

Consistency - Every read receives the most recent write or an error
Availability - Every request receives a response, without guarantee that it contains the most recent version of the information
Partition Tolerance - The system continues to operate despite arbitrary partitioning due to network failures

Networks aren’t reliable, so you’ll need to support partition tolerance. You’ll need to make a software tradeoff between consistency and availability.
CP - consistency and partition tolerance

Waiting for a response from the partitioned node might result in a timeout error. CP is a good choice if your business needs require atomic reads and writes.
AP - availability and partition tolerance

Responses return the most readily available version of the data available on any node, which might not be the latest. Writes might take some time to propagate when the partition is resolved.

AP is a good choice if the business needs allow for eventual consistency or when the system needs to continue working despite external errors.

28
Q

Benefits of a reverse proxy

A

Increased security - Hide information about backend servers, blacklist IPs, limit number of connections per client

Increased scalability and flexibility - Clients only see the reverse proxy’s IP, allowing you to scale servers or change their configuration

SSL termination - Decrypt incoming requests and encrypt server responses so backend servers do not have to perform these potentially expensive operations. Removes the need to install X.509 certificates on each server

Compression - Compress server responses

Caching - Return the response for cached requests

Static content - Serve static content directly

29
Q

What are ACID transactions?

A

A type of database transaction that has four important properties:

Atomicity: The operations that constitute the transaction will either all succeed or all fail. There is no in-between state.
Consistency: The transaction cannot bring the database to an invalid state. After the transaction is committed or rolled back, the rules for each record will still apply, and all future transactions will see the effect of the transaction. Also named Strong Consistency.
Isolation: The execution of multiple transactions concurrently will have the same effect as if they had been executed sequentially.
Durability: Any committed transaction is written to non-volatile storage. It will not be undone by a crash, power loss, or network partition.
30
Q

What is Asymmetric Encryption?

A

Also known as public-key encryption, asymmetric encryption relies on two keys—a public key and a private key—to encrypt and decrypt data. The keys are generated using cryptographic algorithms and are mathematically connected such that data encrypted with the public key can only be decrypted with the private key.

While the private key must be kept secure to maintain the fidelity of this encryption paradigm, the public key can be openly shared.

Asymmetric-key algorithms tend to be slower than their symmetric counterparts.

31
Q

What is Blob Storage?

A

Widely used kind of storage, in small and large scale systems. They don’t really count as databases per se, partially because they only allow the user to store and retrieve data based on the name of the blob. This is sort of like a key-value store but usually blob stores have different guarantees. They might be slower than KV stores but values can be megabytes large (or sometimes gigabytes large). Usually people use this to store things like large binaries, database snapshots, or images and other static assets that a website might have.

Blob storage is rather complicated to have on premise, and only giant companies like Google and Amazon have infrastructure that supports it. So usually in the context of System Design interviews you can assume that you will be able to use GCS or S3. These are blob storage services hosted by Google and Amazon respectively, that cost money depending on how much storage you use and how often you store and retrieve blobs from that storage.

32
Q

What is a cache?

A

A piece of hardware or software that stores data, typically meant to retrieve that data faster than otherwise.

Caches are often used to store responses to network requests as well as results of computationally-long operations.

Note that data in a cache can become stale if the main source of truth for that data (i.e., the main database behind the cache) gets updated and the cache doesn’t.

33
Q

What is a Cache Eviction Policy?

A

The policy by which values get evicted or removed from a cache. Popular cache eviction policies include LRU (least-recently used), FIFO (first in first out), and LFU (least-frequently used).

34
Q

Describe the client server model

A

The paradigm by which modern systems are designed, which consists of clients requesting data or service from servers and servers providing data or service to clients.

35
Q

What is Consistent Hashing

A

A type of hashing that minimizes the number of keys that need to be remapped when a hash table gets resized. It’s often used by load balancers to distribute traffic to servers; it minimizes the number of requests that get forwarded to different servers when new servers are added or when existing servers are brought down.

36
Q

What is a Content Delivery Network?

A

A CDN is a third-party service that acts like a cache for your servers. Sometimes, web applications can be slow for users in a particular region if your servers are located only in another region. A CDN has servers all around the world, meaning that the latency to a CDN’s servers will almost always be far better than the latency to your servers. A CDN’s servers are often referred to as PoPs (Points of Presence). Two of the most popular CDNs are Cloudflare and Google Cloud CDN.

37
Q

What is a database index?

A

A special auxiliary data structure that allows your database to perform certain queries much faster. Indexes can typically only exist to reference structured data, like data stored in relational databases. In practice, you create an index on one or multiple columns in your database to greatly speed up read queries that you run very often, with the downside of slightly longer writes to your database, since writes have to also take place in the relevant index.

38
Q

What is a database lock?

A

In a relational database that provides ACID transactions, updating rows inside a table will cause a lock to be held on that table or on the rows you are updating. If a second transaction tries to update the same rows, it will block before the update until the first transaction releases that lock. This is one of the core mechanisms behind the Atomicity of ACID transactions.

39
Q

What is a Distributed File System?

A

A Distributed File System is an abstraction over a (usually large) cluster of machines that allows them to act like one large file system. The two most popular implementations of a DFS are the Google File System (GFS) and the Hadoop Distributed File System (HDFS).

Typically, DFSs take care of the classic availability and replication guarantees that can be tricky to obtain in a distributed-system setting. The overarching idea is that files are split into chunks of a certain size (4MB or 64MB, for instance), and those chunks are sharded across a large cluster of machines. A central control plane is in charge of deciding where each chunk resides, routing reads to the right nodes, and handling communication between machines.

Different DFS implementations have slightly different APIs and semantics, but they achieve the same common goal: extremely large-scale persistent storage

40
Q

What is a Idempotent Operation?

A

An operation that has the same ultimate outcome regardless of how many times it’s performed. If an operation can be performed multiple times without changing its overall effect, it’s idempotent. Operations performed through a Pub/Sub messaging system typically have to be idempotent, since Pub/Sub systems tend to allow the same messages to be consumed multiple times.

For example, increasing an integer value in a database is not an idempotent operation, since repeating this operation will not have the same effect as if it had been performed only once. Conversly, setting a value to “COMPLETE” is an idempotent operation, since repeating this operation will always yield the same result: the value will be “COMPLETE”.

41
Q

What is MapReduce?

A

A popular framework for processing very large datasets in a distributed setting efficiently, quickly, and in a fault-tolerant manner. A MapReduce job is comprised of 3 main steps:

the Map step, which runs a map function on the various chunks of the dataset and transforms these chunks into intermediate key-value pairs.
the Shuffle step, which reorganizes the intermediate key-value pairs such that pairs of the same key are routed to the same machine in the final step.
the Reduce step, which runs a reduce function on the newly shuffled key-value pairs and transforms them into more meaningful data.

The canonical example of a MapReduce use case is counting the number of occurrences of words in a large text file.

When dealing with a MapReduce library, engineers and/or systems administrators only need to worry about the map and reduce functions, as well as their inputs and outputs. All other concerns, including the parallelization of tasks and the fault-tolerance of the MapReduce job, are abstracted away and taken care of by the MapReduce implementation.

42
Q

Describe the Publish/Subscribe Pattern

A

Often shortened as Pub/Sub, the Publish/Subscribe pattern is a popular messaging model that consists of publishers and subscribers. Publishers publish messages to special topics (sometimes called channels) without caring about or even knowing who will read those messages, and subscribers subscribe to topics and read messages coming through those topics.

Pub/Sub systems often come with very powerful guarantees like at-least-once delivery, persistent storage, ordering of messages, and replayability of messages.

43
Q

What is Redis?

A

An in-memory key-value store. Does offer some persistent storage options but is typically used as a really fast, best-effort caching solution. Redis is also often used to implement rate limiting.

44
Q

What is database replication?

A

The act of duplicating the data from one database server to others. This is sometimes used to increase the redundancy of your system and tolerate regional failures for instance. Other times you can use replication to move data closer to your clients, thus decreasing the latency of accessing specific data.

45
Q

What is database sharding?

A

Sometimes called data partitioning, sharding is the act of splitting a database into two or more pieces called shards and is typically done to increase the throughput of your database. Popular sharding strategies include:

Sharding based on a client's region
Sharding based on the type of data being stored (e.g: user data gets stored in one shard, payments data gets stored in another shard)
Sharding based on the hash of a column (only for structured data)
46
Q

What is a stateless server?

A

A server is usually called “stateless” if it does not require state to be persisted to disk in order to run successfully. Although many server process typically hold some state in memory including caching layers for instance, this typically means that we can run the server process the same way on any machine, and move it around whenever we want. This contrasts with Stateful processes.

47
Q

Describe the TLS handshake

A

The process through which a client and a server communicating over HTTPS exchange encryption-related information and establish a secure communication. The typical steps in a TLS handshake are roughly as follows:

The client sends a client hello—a string of random bytes—to the server.
The server responds with a server hello—another string of random bytes—as well as its SSL certificate, which contains its public key.
The client verifies that the certificate was issued by a certificate authority and sends a premaster secret—yet another string of random bytes, this time encrypted with the server's public key—to the server.
The client and the server use the client hello, the server hello, and the premaster secret to then generate the same symmetric-encryption session keys, to be used to encrypt and decrypt all data communicated during the remainder of the connection.
48
Q

What is Apache Kafka?

A

A distributed messaging system created by LinkedIn. Very useful when using the streaming paradigm as opposed to polling.