Systems Design 1 Flashcards

1
Q

Interview Steps

A

Clarify Requirements
Back of Envelope Estimation
Define Data Model
High Level Design Drawing
Identify and Resolve what remains

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

CAP Theorem

A

Consistency
Availability
Partition Tolerance

You can only choose 2 properties

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

CP System

A

Consistency and Partition Tolerance

Data is consistent between all nodes, and maintains partition tolerance (preventing data desync) by becoming unavailable when a node goes down.

Sacrifices Availability so system might not respond during network issues to maintain data accuracy. When partition occurs may make node unavailable to ensure data consistency across nodes.

Banking systems use CP databases because ensuring accurate account balances is more critical than being always available.

Newer systems tend to focus more on availability than consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Consistency

A

data is the same across the cluster, so you can read or write from/to any node and get the same data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Partition Tolerance

A

The database continues to work even if there is a network failure or a part of the system is unreachable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Partition

A

A section of a database that contains its own data and indexes. Splits a large database into smaller parts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why Partition?

A

Scaling - easier to scale since its broken into smaller more manageable parts

Performance - Queries can run faster since there is less data to scan

Availability - If one partition fails only a fraction of the data is lost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What databases can use partitioning

A

SQL databases (MySQL and PostgreSQL)
NoSQL databases (mongoDB and Cassandra)
S3 and Redis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Vertical Partitioning

A

Multiple Tables - split data across tables with different columns which share a key such as EmployeeID

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Horizontal Partitioning

A

Data is separated by a key such as a region identifier. Each data store shares the same columns and data structure, but can be split across multiple servers. Each partition can also be backed up and restored independently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When to partition

A

Historical Data - Archive data older than a certain time as read only

Table is greater than 2GB in size

When contents need to be across different types of storage devices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

AP System

A

Availability and Partition Tolerance

Ensures every request (read or write) gets a response even if some parts of the system are down

Sacrifices consistency, so when data is updated on one node it may take a short amount of time before queries to other nodes reflect the change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

CA Databases

A

data is consistent between all nodes - as long as all nodes are online - and you can read/write from any node and be sure that the data is the same, but if you ever develop a partition between nodes, the data will be out of sync (and won’t re-sync once the partition is resolved).

These pretty much don’t exist (never give answer in interview)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When to use Relational Database

A

Data is structured and you need to handle complex relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When to use Non-Relational Databases

A

Data is unstructured or semi structured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Sharding

A

Horizontal Scaling - same schema but data storage is separated across nodes.

17
Q

Write Behind

A

Syncs data asynchronously

Data between cache (redis or memcached) and db (PostgreSQL) may be temporarily out of sync

18
Q

Write Through

A

Syncs data synchronously

Data in cache and DB are always in sync. When an update is performed to the cache it is immediately updated in the DB

19
Q

When to use write behind

A

When you have a write-heavy workload (e.g. many cache updates) user does not have to wait for changes to be made to DB (This is likely relevant for WandB)

20
Q

When to use write through

A

Use write-through when data consistency is critical. E.g. banking

21
Q

Read Heavy Workload Examples

A

Content delivery platforms (blogs and streaming sites)

Search engines or dashboards with analytics

22
Q

Write-Heavy Workload Examples

A

Event logging systems
IoT platforms or real-time monitoring systems

23
Q

How to Improve Read Latency

A

Use caching layers such as redis or memcached to minimize latency

Optimize query patterns and DB indexes

24
Q

How to improve Write Latency

A

Use batch writes or asynchronous writes to handle high loads

Avoid heavy constraints or triggers that can slow down writes

25
Q

What usually takes precedence for Read Heavy Systems

A

Consistency is often critical for tasks such as analytics and financial data.

Use relational databases or strongly consistent NoSQL options

26
Q

Write-Heavy Systems

A

Availability often takes precedence, especially when event logging or monitoring. Use eventually consistent databases like Cassandra or DynamoDB

27
Q

Availability

A

Ability to access the cluster even if a node in the cluster goes down.

28
Q

Partition Tolerance

A

The cluster continues to function even if there is a “partition” (communication break) between two nodes (both nodes are up, but can’t communicate).