Systems Design 1 Flashcards by Zachary Birenbaum

Interview Steps

Clarify Requirements
Back of Envelope Estimation
Define Data Model
High Level Design Drawing
Identify and Resolve what remains

How well did you know this?

Not at all

Perfectly

CAP Theorem

Consistency
Availability
Partition Tolerance

You can only choose 2 properties

How well did you know this?

Not at all

Perfectly

CP System

Consistency and Partition Tolerance

Data is consistent between all nodes, and maintains partition tolerance (preventing data desync) by becoming unavailable when a node goes down.

Sacrifices Availability so system might not respond during network issues to maintain data accuracy. When partition occurs may make node unavailable to ensure data consistency across nodes.

Banking systems use CP databases because ensuring accurate account balances is more critical than being always available.

Newer systems tend to focus more on availability than consistency

How well did you know this?

Not at all

Perfectly

Consistency

data is the same across the cluster, so you can read or write from/to any node and get the same data.

How well did you know this?

Not at all

Perfectly

Partition Tolerance

The database continues to work even if there is a network failure or a part of the system is unreachable

How well did you know this?

Not at all

Perfectly

Partition

A section of a database that contains its own data and indexes. Splits a large database into smaller parts

How well did you know this?

Not at all

Perfectly

Why Partition?

Scaling - easier to scale since its broken into smaller more manageable parts

Performance - Queries can run faster since there is less data to scan

Availability - If one partition fails only a fraction of the data is lost

How well did you know this?

Not at all

Perfectly

What databases can use partitioning

SQL databases (MySQL and PostgreSQL)
NoSQL databases (mongoDB and Cassandra)
S3 and Redis

How well did you know this?

Not at all

Perfectly

Vertical Partitioning

Multiple Tables - split data across tables with different columns which share a key such as EmployeeID

How well did you know this?

Not at all

Perfectly

Horizontal Partitioning

Data is separated by a key such as a region identifier. Each data store shares the same columns and data structure, but can be split across multiple servers. Each partition can also be backed up and restored independently

How well did you know this?

Not at all

Perfectly

When to partition

Historical Data - Archive data older than a certain time as read only

Table is greater than 2GB in size

When contents need to be across different types of storage devices

How well did you know this?

Not at all

Perfectly

AP System

Availability and Partition Tolerance

Ensures every request (read or write) gets a response even if some parts of the system are down

Sacrifices consistency, so when data is updated on one node it may take a short amount of time before queries to other nodes reflect the change

How well did you know this?

Not at all

Perfectly

CA Databases

data is consistent between all nodes - as long as all nodes are online - and you can read/write from any node and be sure that the data is the same, but if you ever develop a partition between nodes, the data will be out of sync (and won’t re-sync once the partition is resolved).

These pretty much don’t exist (never give answer in interview)

How well did you know this?

Not at all

Perfectly

When to use Relational Database

Data is structured and you need to handle complex relationships

How well did you know this?

Not at all

Perfectly

When to use Non-Relational Databases

Data is unstructured or semi structured

How well did you know this?

Not at all

Perfectly

What is Sharding

Study These Flashcards

Horizontal Scaling - same schema but data storage is separated across nodes.

Write Behind

Study These Flashcards

Syncs data asynchronously

Data between cache (redis or memcached) and db (PostgreSQL) may be temporarily out of sync

Write Through

Study These Flashcards

Syncs data synchronously

Data in cache and DB are always in sync. When an update is performed to the cache it is immediately updated in the DB

When to use write behind

Study These Flashcards

When you have a write-heavy workload (e.g. many cache updates) user does not have to wait for changes to be made to DB (This is likely relevant for WandB)

When to use write through

Study These Flashcards

Use write-through when data consistency is critical. E.g. banking

Read Heavy Workload Examples

Study These Flashcards

Content delivery platforms (blogs and streaming sites)

Search engines or dashboards with analytics

Write-Heavy Workload Examples

Study These Flashcards

Event logging systems
IoT platforms or real-time monitoring systems

How to Improve Read Latency

Study These Flashcards