System Design and Scalability Flashcards

Question 1

Q

Handling the System Design and Scalability Questions

Answer

A

Communicate

Go broad first

Use the whiteboard

Acknowledge interviewer concerns

Be careful about assumptions

State your assumptions explicitly

Estimate when necessary

Drive

Question 2

Q

Design: Step-by-Step

Answer

A

Step1: Scope the Problem

Step 2: Make Reasonable Assumptions

Step 3: Draw the Major Components

Step 4: Identify the Key Issues

Step 5: Redesign for the Key Issues

Question 3

Q

Algorithms that Scale: Step-By-Step

Answer

A

Step 1: Ask Questions

Step 2: Make Believe

Step 3: Get Real

Step 4: Solve Problems

Question 4

Q

Horizontal Scaling

Answer

A

Increase the number of nodes. For example, you might add additional servers, thus decreasing the los on any one server.

Question 5

Q

Vertical Scaling

Answer

A

Increase the resources of a specific node. For example, you might add additional memory to a server to improve its ability to handle laid changes. Typically easier then horizontal scaling, but it’s limited. You can only add so much memory or disk space.

Question 6

Q

Load Balancer

Answer

A

Distributes a system’s load evenly so that one server doesn’t crash and take down the whole system. You have to build out a network of cloned servers that all have essentially the same code and access to the same data.

Question 7

Q

Database Denormalization

Answer

A

Adding redundant information into a database to speed up reads. Joins in a relational database such as SQL can get very slow as the system grows bigger. For this reason, you would generally avoid them. Denormalization is one part of this.

Question 8

Q

NoSQL

Answer

A

A database that does not support joins and might structure data in a different way. It is designed to scale better.

Question 9

Q

Database Partitioning (Sharding)

Answer

A

Split the data across multiple Mach ones while ensuring you have a way of figuring out which data is on which machine.

Question 10

Q

Vertical Partitioning

Answer

A

Partitioning by feature. One drawback of this is that if one of these tables gets very large, you may need to repartition that database (possibly using a different partitioning scheme).

Question 11

Q

Key-Based (or Hash-Based) Partitioning

Answer

A

Uses part of the data (for example, and ID) to partition it. A very simple way to do this is to allocate N servers and put the data on mod(key,n). One issue with this is that the number of servers you have is effectively fixed. Adding additional servers means reallocating all the data - a very expensive task.`

Question 12

Q

Directory-Based Partitioning

Answer

A

Maintain a lookup table for where the data can be found. This makes it relatively easy to add additional servers, but it comes with two major drawbacks. First, the lookup table can be a single point of failure. Second, constantly accessing this table impacts performance.

Question 13

Q

Caching

Answer

A

A simple key-value pairing and typically sits between your application layer and your data store. The cache is tried first before data is looked up in the data store. You may cache a query and its results directly. Or, alternatively, you can cache the specific object.

Question 14

Q

Asynchronous Processing & Queues

Answer

A

Pre-process example: A queue of jobs to be done that updates website. The queue may be slightly out of date. However, we won’t force the user to wait. If the user must wait, we notify the user and allow the process to run asynchronously w.r.t. to the website/app.

Question 15

Q

Bandwidth

Answer

A

The maximum amount of data that can be transferred in a unit of time (bits/second). Maximum number of items that roll off the conveyor belt per second. Increase with fatter or faster belt.

Question 16

Q

Throughput

Answer

Study These Flashcards

A

The actual amount of data that is transferred in a unit time (bits/second). Actual number of items that roll off the conveyor belt per second. Increases with fatter and faster belt.

Question 17

Q

Latency

Answer

Study These Flashcards

A

How long it takes data to go from one end to the other. Time it takes an item to travel on a conveyor belt. Decreases with shortening and faster belts.

Question 18

Q

MapReduce

Answer

Study These Flashcards

A

Typically used to process large amounts of data. Requires a map and reduce step. Map takes in data and emits key, value pairs. Reduce takes a key and set of values and reduces them, giving a new key and value. Allows for parallel processing.

System Design and Scalability Flashcards

(18 cards)