System Design - General Flashcards by Katie D

Latency versus Throughput

Latency is the time needed to perform some action.
Throughput is the number of actions per unit time.
Generally, you want to maximize throughput for acceptable latency.

How well did you know this?

Not at all

Perfectly

Performance Requirements

Scale
Latency
Consistency
Uptime
Reliability (data loss scenarios)
Fail open / Fail closed

How well did you know this?

Not at all

Perfectly

Approach to System Design Interviews

Clarify requirements
Functional Requirements
Performance Requirements
APIs
Back of the envelope estimations
Data Model
High level design
Individual component design
Monitoring & Observability & Alerting

How well did you know this?

Not at all

Perfectly

CPU vs Core

The core is the processor part of the CPU. In the past, each CPU had a single processor (a single core). Now a single CPU can have multiple processors.

How well did you know this?

Not at all

Perfectly

CAP Theorem

Consistency, Availability, Partition Tolerance

You can only have two

How well did you know this?

Not at all

Perfectly

Strong Consistency

Every read receives the most recent write or an error

Data is the same across the cluster so you can read and write to/from any node and get the same data

How well did you know this?

Not at all

Perfectly

Availability

Ability to access the cluster even if a node in the cluster goes down

Every request receives a response but the response may not reflect the most recent data

How well did you know this?

Not at all

Perfectly

Partition Tolerance

A partition means two nodes in the cluster are unable to communicate with one another

Partition tolerance means the system continues to operate even if there is a communication failure (a partition) between nodes due to network failures

Since networks aren’t reliable, you need to support partition tolerance. That means the tradeoff becomes between consistency and availability.

How well did you know this?

Not at all

Perfectly

Consistency and partition tolerance

CP system

Waiting for a response from a partitioned node might result in a timeout or an error. Good choice if your business requires atomic reads and writes.

How well did you know this?

Not at all

Perfectly

Consistency and Availability

CA system

Not possible unless you’re ok with losing data once the network failure (the partition) is resolved

Since network failures are inevitable, the real tradeoff is between consistency and availability

How well did you know this?

Not at all

Perfectly

Availability and partition tolerance

The response returns the most readily available version of the data available on any node (which might not reflect the most recent write). Good choice is you need eventual consistency or when the system needs to continue working despite errors.

How well did you know this?

Not at all

Perfectly

Weak consistency

After a write, reads may or may not see it. A best effort approach is taken.

Examples: VoIP, video chat, realtime multiplayer games. If you lose reception for a few minutes, when you reconnect you don’t hear what you missed.

How well did you know this?

Not at all

Perfectly

Eventual consistency

After a write, reads will eventually see it. Data is replicated async. Works well in highly available systems.

Examples: DNS, Email

How well did you know this?

Not at all

Perfectly

Strong consistency

After a write, reads with see it. Data is replicated synchronously.

Examples: transaction based systems

How well did you know this?

Not at all

Perfectly

High Availability Patterns

Fail-over and replication

These are complimentary, not mutually exclusive

How well did you know this?

Not at all

Perfectly

Active-passive failover

Study These Flashcards

Also referred to as master slave

If the active server stops sending heartbeats, the passive server takes over.

Only the active server handles traffic

Active-active failover

Study These Flashcards

Both servers are managing traffic with the load spread between them.

Also called master-master failover

99.9% uptime (three nines)

Study These Flashcards

Just under 9 hours downtime a year (8h 45 min)

99.999% uptime (five nines)

Study These Flashcards

5 minutes per year

99.99% uptime (four nines)

Study These Flashcards

Just under 1 hour a year (52 min)

Spanner

Study These Flashcards

GCP high availability global SQL database

Not truly CAP but truly high availability (5 9s). One failure in 10^5 reads or writes.

In the event of partitions, Spanner becomes CP.

Vertical partitioning

Study These Flashcards

Some columns are moved to new tables. Each table contains the same number of rows but fewer column
Can be used to isolate rarely used columns

Horizontal partitioning

Study These Flashcards

also known as sharding
it divides a table into multiple smaller tables. Each table is a separate data store, and it contains the same number of columns, but fewer rows

Horizontal Sharding Pros & Cons

Study These Flashcards

Pros
- Facilitate horizontal scaling. Sharding facilitates the possibility of adding more machines to spread out the load.
- Reduces response time. By sharding one table into multiple tables, queries go over fewer rows, and results are returned much more quickly.

Cons
- The order by operation is more complicated. Usually, we need to fetch data from different shards and sort the data in the application’s code.
- Uneven distribution. Hot spots. Some shards may contain more data than others (this is also called the hotspot).

Sharding routing algorithms

- Range-based sharding. This algorithm uses ordered columns, such as integers, longs, timestamps, to separate the rows. - Hash-based sharding. This algorithm applies a hash function to one column or several columns to decide which row goes to which table.

p99 latency

99% of requests should be faster Only 1% of requests are expected to the slower than this number

Web socket

- Most common solution for sending async updates from client to server - The connection is initiated by the client. After that, it is bidirectional and persistent.

CDN

Content Delivery Network Content Distribution Network Geolocates data closer to user to reduce load speeds

BLOB

Binary large object A collection of binary data stored as a single entity in a database

Video encoding / transcoding

It is the process of converting a video format to other formats (MPEG, HLS, etc), which provide the best video streams possible for different devices and bandwidth capabilities.

codecs

These are compression and decompression algorithms aim to reduce the video size while preserving the video quality.

Bloom Filter

- space-efficient probabilistic data structure - used to test whether an element is a member of a set. - False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set".

Consistent hashing

Consistent hashing is a special kind of hashing such that when a hash table is re-sized and consistent hashing is used, only k/n keys need to be remapped on average, where k is the number of keys, and n is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped

System Design - General Flashcards

(33 cards)