System Design and Scalability Flashcards

1
Q

Handling the System Design and Scalability Questions

A

Communicate

Go broad first

Use the whiteboard

Acknowledge interviewer concerns

Be careful about assumptions

State your assumptions explicitly

Estimate when necessary

Drive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Design: Step-by-Step

A

Step1: Scope the Problem

Step 2: Make Reasonable Assumptions

Step 3: Draw the Major Components

Step 4: Identify the Key Issues

Step 5: Redesign for the Key Issues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Algorithms that Scale: Step-By-Step

A

Step 1: Ask Questions

Step 2: Make Believe

Step 3: Get Real

Step 4: Solve Problems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Horizontal Scaling

A

Increase the number of nodes. For example, you might add additional servers, thus decreasing the los on any one server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Vertical Scaling

A

Increase the resources of a specific node. For example, you might add additional memory to a server to improve its ability to handle laid changes. Typically easier then horizontal scaling, but it’s limited. You can only add so much memory or disk space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Load Balancer

A

Distributes a system’s load evenly so that one server doesn’t crash and take down the whole system. You have to build out a network of cloned servers that all have essentially the same code and access to the same data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Database Denormalization

A

Adding redundant information into a database to speed up reads. Joins in a relational database such as SQL can get very slow as the system grows bigger. For this reason, you would generally avoid them. Denormalization is one part of this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

NoSQL

A

A database that does not support joins and might structure data in a different way. It is designed to scale better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Database Partitioning (Sharding)

A

Split the data across multiple Mach ones while ensuring you have a way of figuring out which data is on which machine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Vertical Partitioning

A

Partitioning by feature. One drawback of this is that if one of these tables gets very large, you may need to repartition that database (possibly using a different partitioning scheme).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Key-Based (or Hash-Based) Partitioning

A

Uses part of the data (for example, and ID) to partition it. A very simple way to do this is to allocate N servers and put the data on mod(key,n). One issue with this is that the number of servers you have is effectively fixed. Adding additional servers means reallocating all the data - a very expensive task.`

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Directory-Based Partitioning

A

Maintain a lookup table for where the data can be found. This makes it relatively easy to add additional servers, but it comes with two major drawbacks. First, the lookup table can be a single point of failure. Second, constantly accessing this table impacts performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Caching

A

A simple key-value pairing and typically sits between your application layer and your data store. The cache is tried first before data is looked up in the data store. You may cache a query and its results directly. Or, alternatively, you can cache the specific object.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Asynchronous Processing & Queues

A

Pre-process example: A queue of jobs to be done that updates website. The queue may be slightly out of date. However, we won’t force the user to wait. If the user must wait, we notify the user and allow the process to run asynchronously w.r.t. to the website/app.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Bandwidth

A

The maximum amount of data that can be transferred in a unit of time (bits/second). Maximum number of items that roll off the conveyor belt per second. Increase with fatter or faster belt.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Throughput

A

The actual amount of data that is transferred in a unit time (bits/second). Actual number of items that roll off the conveyor belt per second. Increases with fatter and faster belt.

17
Q

Latency

A

How long it takes data to go from one end to the other. Time it takes an item to travel on a conveyor belt. Decreases with shortening and faster belts.

18
Q

MapReduce

A

Typically used to process large amounts of data. Requires a map and reduce step. Map takes in data and emits key, value pairs. Reduce takes a key and set of values and reduces them, giving a new key and value. Allows for parallel processing.