System Design - General Flashcards

1
Q

Latency versus Throughput

A

Latency is the time needed to perform some action.
Throughput is the number of actions per unit time.
Generally, you want to maximize throughput for acceptable latency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Performance Requirements

A
  • Scale
  • Latency
  • Consistency
  • Uptime
  • Reliability (data loss scenarios)
  • Fail open / Fail closed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Approach to System Design Interviews

A
  1. Clarify requirements
  2. Functional Requirements
  3. Performance Requirements
  4. APIs
  5. Back of the envelope estimations
  6. Data Model
  7. High level design
  8. Individual component design
  9. Monitoring & Observability & Alerting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

CPU vs Core

A

The core is the processor part of the CPU. In the past, each CPU had a single processor (a single core). Now a single CPU can have multiple processors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

CAP Theorem

A

Consistency, Availability, Partition Tolerance

You can only have two

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Strong Consistency

A

Every read receives the most recent write or an error

Data is the same across the cluster so you can read and write to/from any node and get the same data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Availability

A

Ability to access the cluster even if a node in the cluster goes down

Every request receives a response but the response may not reflect the most recent data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Partition Tolerance

A

A partition means two nodes in the cluster are unable to communicate with one another

Partition tolerance means the system continues to operate even if there is a communication failure (a partition) between nodes due to network failures

Since networks aren’t reliable, you need to support partition tolerance. That means the tradeoff becomes between consistency and availability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Consistency and partition tolerance

A

CP system

Waiting for a response from a partitioned node might result in a timeout or an error. Good choice if your business requires atomic reads and writes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Consistency and Availability

A

CA system

Not possible unless you’re ok with losing data once the network failure (the partition) is resolved

Since network failures are inevitable, the real tradeoff is between consistency and availability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Availability and partition tolerance

A

The response returns the most readily available version of the data available on any node (which might not reflect the most recent write). Good choice is you need eventual consistency or when the system needs to continue working despite errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Weak consistency

A

After a write, reads may or may not see it. A best effort approach is taken.

Examples: VoIP, video chat, realtime multiplayer games. If you lose reception for a few minutes, when you reconnect you don’t hear what you missed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Eventual consistency

A

After a write, reads will eventually see it. Data is replicated async. Works well in highly available systems.

Examples: DNS, Email

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Strong consistency

A

After a write, reads with see it. Data is replicated synchronously.

Examples: transaction based systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

High Availability Patterns

A

Fail-over and replication

These are complimentary, not mutually exclusive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Active-passive failover

A

Also referred to as master slave

If the active server stops sending heartbeats, the passive server takes over.

Only the active server handles traffic

17
Q

Active-active failover

A

Both servers are managing traffic with the load spread between them.

Also called master-master failover

18
Q

99.9% uptime (three nines)

A

Just under 9 hours downtime a year (8h 45 min)

19
Q

99.999% uptime (five nines)

A

5 minutes per year

20
Q

99.99% uptime (four nines)

A

Just under 1 hour a year (52 min)

21
Q

Spanner

A

GCP high availability global SQL database

Not truly CAP but truly high availability (5 9s). One failure in 10^5 reads or writes.

In the event of partitions, Spanner becomes CP.

22
Q

Vertical partitioning

A
  • Some columns are moved to new tables. Each table contains the same number of rows but fewer column
  • Can be used to isolate rarely used columns
23
Q

Horizontal partitioning

A
  • also known as sharding
  • it divides a table into multiple smaller tables. Each table is a separate data store, and it contains the same number of columns, but fewer rows
24
Q

Horizontal Sharding Pros & Cons

A

Pros
- Facilitate horizontal scaling. Sharding facilitates the possibility of adding more machines to spread out the load.
- Reduces response time. By sharding one table into multiple tables, queries go over fewer rows, and results are returned much more quickly.

Cons
- The order by operation is more complicated. Usually, we need to fetch data from different shards and sort the data in the application’s code.
- Uneven distribution. Hot spots. Some shards may contain more data than others (this is also called the hotspot).

25
Q

Sharding routing algorithms

A
  • Range-based sharding. This algorithm uses ordered columns, such as integers, longs, timestamps, to separate the rows.
  • Hash-based sharding. This algorithm applies a hash function to one column or several columns to decide which row goes to which table.
26
Q

p99 latency

A

99% of requests should be faster
Only 1% of requests are expected to the slower than this number

27
Q

Web socket

A
  • Most common solution for sending async updates from client to server
  • The connection is initiated by the client. After that, it is bidirectional and persistent.
28
Q

CDN

A

Content Delivery Network
Content Distribution Network
Geolocates data closer to user to reduce load speeds

29
Q

BLOB

A

Binary large object
A collection of binary data stored as a single entity in a database

30
Q

Video encoding / transcoding

A

It is the process of converting a video format to other formats (MPEG, HLS, etc), which provide the best video streams possible for different devices and bandwidth capabilities.

31
Q

codecs

A

These are compression and decompression algorithms aim to reduce the video size while preserving the video quality.

32
Q

Bloom Filter

A
  • space-efficient probabilistic data structure
  • used to test whether an element is a member of a set.
  • False positive matches are possible, but false negatives are not – in other words, a query returns either “possibly in set” or “definitely not in set”.
33
Q

Consistent hashing

A

Consistent hashing is a special kind of hashing such that when a hash table is re-sized and consistent hashing is used, only k/n keys need to be remapped on average, where k is the number of keys, and n is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped