System Design Flashcards

1
Q

7 steps of systems design problems

A
  1. Requirements clarification
  2. Back-of-the-envelope estimation
  3. System interface definition
  4. Defining data model
  5. High-level design
  6. Detailed design
  7. Identifying and resolving bottlenecks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Step 1: Requirements Clarification

A
  • Determine exact scope
  • Define end goals of system
  • Clarify which parts of system to focus on (e.g. back-end vs. front-end)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Step 2: Back-of-the-envelope estimation

A
  • Estimate and quantify scale of system
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Step 3: System interface definition

A

Define what APIs are expected from the system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Step 4: Define data model

A
  • How will data flow between components?
  • How will different entities interact with each other?
  • How will we partition and manage data? Specific choices:
    • Which database system should we use? NoSQL vs SQL?
    • What kind of block storage should we use to store files? (e.g. multimedia)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Step 5: High-level design

A
  • Draw a block diagram representing core components to solve the actual problem from end-to-end.
  • Possibly describe the system verbally or type out in some kind of list format
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Step 6: Detailed design

A
  • Dig deeper in to 2-3 major components, guided by interviewer feedback.
  • Consider tradeoffs between different approaches.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Step 7: Identifying and resolving bottlenecks

A
  • Identify any single points of failure and discuss mitigation
  • Discuss redundancy and backup plans for data and services
  • Discuss performance monitoring
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

key characteristics of distributed systems

A
  • Scalability
  • Reliability
  • Availability
  • Efficiency
  • Serviceability or Manageability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Scalability

A

capability of a system, process, or network to grow and manage increased demand

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Horizontal vs vertical scaling

A
  • Horizontal is easier to scale dynamically by adding machines
  • Vertical scaling is upper limited and may involve downtime
  • Horizontal scaling examples: Cassandra, MongoDB
  • Vertical scaling examples: MySQL
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Reliability

A
  • Probably a system will fail in a given period
  • Distributed system: keeps delivering services with one or several component failures
  • Availability over time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Availability

A
  • Time a system remains operational over a specific period

* Accounts for maintainability and repair

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Efficiency

A
  • Latency / response time to requests (correlates to number of messages)
  • throughput / bandwidth (correlates to size of messages)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Serviceability / manageability

A
  • Ease to operate and maintain
  • Simplicity and speed of repair (as it increases, availability/reliability decrease)
  • Considerations: ease of diagnostics, ease of updates
  • Automated fault detection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Load balancer

A
  • component to spread traffic across a cluster of servers
  • improves responsiveness and availability
  • crucial to horizontal scaling
  • Ensure health of chosen server
  • Select a healthy server
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Load balancing placements

A
  • Between client and web server
  • between web servers and internal layer (app servers)
  • between internal layer and database
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Load balancer: least connection method

A
  • directs traffic to server with fewest connections

* good for large number of persistent connections

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Load balancer: least response time method

A
  • directs traffic to the server with the lowest response time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Load balancer: least bandwidth method

A
  • selects the server that is currently serving the least amount of traffic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Load balancer: round robin method

A
  • cycles through available servers and sends each request to the next server
  • good for servers of equal specs and few persistent requests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Load balancer: weighted round robin method

A
  • round robin but with weights on different servers based upon processing capacity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Load balancer: IP hash method

A
  • client IP address is hashed and servers are each assigned blocks of hashes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Load balancer redundancy

A
  • can be a single point of failure
  • can add more LBs to form a cluster of active/passive instances
  • clustered LBs monitor each other and passive takes over if active fails
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Caching

A
  • locality of reference principle: recently requested data is likely to be requested again
  • often implemented near front end to reduce downstream traffic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Application server cache

A
  • cache placed directly on request layer, check if each request is in the cache before fetching from disk
  • can have multiple layers, e.g. node memory -> node disk (still faster than network)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Content Delivery Network (CDN)

A
  • cache for sites serving large amounts of static media
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Cache invalidation

A
  • maintenance to keep cache consistent with database when data changes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Write-through cache

A
  • Data is written into the cache and DB simultaneously
  • Maximizes consistency, minimizes risk of loss
  • Higher latency as a result of double write operations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Write-around cache

A
  • Data is written directly to permanent storage, bypassing the cache
  • avoids flooding the cache with writes
  • increases chance of cache misses, which must then be read from back end with higher latency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Write-back cache

A
  • Data is written to cache alone
  • Write to permanent storage is done in intervals or chunks
  • Lowest latency, highest throughput for write-intensive apps
  • Risk of data loss due to crash since there is no cache backup
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Cache eviction policies (6)

A
  • FIFO - evicts oldest block first
  • LIFO - evicts newest block first
  • LRU - evicts least recently used items first
  • MRU - evicts most recently used items first
  • LFU - evicts least frequently used items first
  • RR - random replacement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Importance of estimation

A
  • important later for scaling, partitioning, load balancing, and caching
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Examples of system parameters to estimate

A
  • Examples of things to quantify: number of actions, amount of storage, expected network bandwidth usage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Components to include in high level design

A

*Clients, load balancing, application servers, databases, file storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Examples of detailed design topics

A
  • How will we partition data types between multiple databases?
  • How should we optimize data storage further (e.g. recency)?
  • How much and at what layer should we implement caching?
  • Which components need load balancing?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Horizontal scaling

A

Horizontal add more servers in to your resource pool

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Vertical scaling

A

Adding more power/capacity to an existing server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Cache misses

A

when request data is not found in the cache

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Cache invalidation: 3 main schemas

A
  • write-through
  • write-around
  • write-back
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

CDN for smaller systems

A
  • for smaller systems, we can design for future transition to a CDN with a separate subdomain for static media
42
Q

Data partitioning

A

technique to break up a big database in to many smaller parts across multiple machines

43
Q

Benefits of data partitioning

A

Improves the manageability, performance, availability, and load balancing of an application

44
Q

Justification for data partiioning

A

After a certain scale point, it is cheaper and easier to scale horizontally by adding machines than it is to grow vertically

45
Q

Popular data partitioning methods/schemes

A
  • Horizontal partitioning
  • Vertical partitioning
  • Directory-based partitioning
46
Q

Horizontal partitioning (range-based partitioning) (data sharding)

A

Putting different rows in to different tables based upon range of a certain value

47
Q

Main risk with horizontal partitioning

A

If the value chosen for partitioning isn’t evenly distributed, then the scheme will lead to unbalanced servers

48
Q

Vertical partitioning

A
  • divide data to store tables related to a specific feature in their own server
  • different types of data in different servers
49
Q

Main risk with vertical partitioning

A

If the app grows, we may need to further/horizontally partition a feature-specific database

50
Q

Directory Based Partitioning

A
  • loosely coupled approach
  • create a lookup service which knows your current partitioning scheme
  • separates partitioning from the DB access code
  • functionality: query the directory server which holds the mapping between key and DB server
51
Q

Directory based partitioning benefits

A
  • because it is loosely coupled, we can add servers to the DB pool or change the partitioning scheme without impacting the application
52
Q

Key or hash-based partitioning

A
  • apply a hash to some attributes of the entity we are storing, yielding a partition number
  • problem: effectively fixes the total number of DB servers - workaround is to use consistent hashing
53
Q

List partitioning

A
  • each partition is assigned a list of values

* check each record against the list and store it in the relevant partition

54
Q

Round-robin partitioning

A
  • ensures uniform data distribution by rotating data assignment between partitions
55
Q

Composite partitioning

A
  • combination of any partitioning schemes to devise a new scheme
  • e.g. first applying list partitioning then hash based
56
Q

Consistent hashing

A
  • hash-based approach that handles adding/removing servers
  • hash the objects and the servers randomly to a unit circle
  • hash(o) mod(360)
  • assign each object to the next server in the circle in clockwise order
57
Q

Hash-based partitioning algorithm

A
  • in a system with n servers, place object o in server with id hash(o) mod n
58
Q

Consistent hashing benefits

A

If a server fails: only objects mapped to the failed server need to be reassigned to the next server clockwise

If a server is added: only objects mapped to the new server need to be moved

In either case, most objects maintain their prior assignments

59
Q

Partitioning issues: joins and denormalization

A

Joins are often not feasible across partitions

Common workaround is to denormalize the DB by adding redundant copies across multiple databases

Denormalization downside is increased risk of data inconsistently

60
Q

Partitioning issues: Referential integrity

A

Most RDBMS do not support foreign keys across DBs on different servers

Apps that require referential integrity across partitioned DBs often have to enforce it in application code

61
Q

Partitioning issues: rebalancing

A

Rebalancing is difficult without incurring downtime since we have to move resources across partitions

Directory-based partitioning can make rebalancing easier at the cost of increased system complexity and a new single point of failure on the lookup service

62
Q

Database indexing

A

Create an index on particular DB tables to make it faster to search.

Index can be created across one or more columns and includes a pointer to the full row

63
Q

Indexing use cases

A
  • Finding a small payload in a large dataset
64
Q

Indexing performance impacts

A
  • Can speed up retrieval
  • Increased write time
  • Increased storage requirements
65
Q

Proxy server

A
  • intermediate server between the requests from clients and the servers that handle those requests
66
Q

Proxy server use cases

A
  • Filter, log, and transform requests

* Can serve requests from its cache to reduce downstream load

67
Q

Open proxy

A

proxy server accessible to any internet user

68
Q

Reverse proxy

A

Proxy within an internal network, not visible to the client. Can include load balancing, caching, security to protect internal servers from direct access

69
Q

Forward proxy

A

Proxy managed by a client to handle requests to an external server

70
Q

Redundancy

A

Duplication of critical components or functions in a system to increase the reliability or improve performance

71
Q

Replication

A

Sharing information to ensure consistency between redundant resources.

Often used in DBMS where updates are written to a primary server which then passes it to the replica servers

72
Q

Relational database (SQL)

A
  • Structured with predefined schemas
  • Each row contains information about an entity
  • Each column contains a particular point of data
73
Q

Non-relational database (NoSQL)

A
  • Unstructured
  • easily distributed
  • dynamic schema
74
Q

Common types of NoSQL DBs

A
  • Key-value stores
  • Document DBs
  • wide-column databases
  • graph databases
75
Q

Key-value store and examples

A
  • Stores an array of key-value pairs

* Examples: Redis

76
Q

Document databases

A
  • Data is stored in documents which are grouped in to collections
  • Each document can have a unique structure
  • Example: MongoDB
77
Q

Wide-column database

A
  • Instead of tables, uses column families which are containers for rows
  • Don’t need to know all the columns up front
  • Each row can have different numbers of columns
  • Best for analyzing largedatasets
  • Examples: Cassandra, HBase
78
Q

Graph database

A
  • Store data graph relations
  • Data saved in the form of nodes and their properties, and lines/connections between nodes
  • Examples: Neo4J
79
Q

SQL vs NoSQL: Storage

A

SQL: data stored in tables where each row represents an entity
NoSQL: variety of storage models

80
Q

SQL vs NoSQL: Schema

A

SQL: each record conforms to a fixed/predefined schema. Higher referential integrity.
NoSQL: schemas are dynamic and changeable

81
Q

SQL vs NoSQL: querying

A

SQL: uses SQL to manipulate and precisely retrieve subsets of data
NoSQL: queries are focused on retrieving collections of full records

82
Q

SQL vs NoSQL: scalability

A

SQL: vertically scalable, difficult to scale horizontally without duplication or indexing
NoSQL: highly horizontally scalable since referential integrity is less of a priority

83
Q

SQL vs NoSQL: ACID compliance

A

ACID: Atomicity, Consistency, Isolation, Durability

SQL: usually ACID compliant therefore more reliable
NoSQL: sacrifices ACID compliance for performance and scalability

84
Q

CAP Theorem

A

States that it is impossible for a distributed system to simultaneously provide more than 2/3 of the following:
* consistency, availability, and partition tolerance

85
Q

Consistency

A

Every read retrieves the most recent write

86
Q

Availability

A

Every request receives a response

87
Q

Partition tolerance

A

The system continues to work despite message loss or partial failure.
Data is sufficiently replicated across nodes and networks to handle intermittent/partial outages

88
Q

standard HTTP web request sequence of events

A
  1. Client opens a connection and requests data from server
  2. The server calculates the response
  3. The server sends back the response
89
Q

AJAX (asynchronous Javascript) polling

A

Client repeatedly polls a server for data

90
Q

Long-polling (Hanging GET)

A

Client requests information from the server, server holds the request open and waits until data is available

91
Q

WebSockets

A

Persistent connection between a client and server over a single TCP connection

  • connection established via a WebSocket handshake
  • allows for real-time data transfer
92
Q

Server side events (SSEs)

A
  • Clients establish a persistent, long term connection to the server
  • Server uses this connection to send data to a client
93
Q

storage capacity model

A

margin of extra storage above anticipated needs

94
Q

REST API

A

REpresentational State Transfer

95
Q

4 possible API actions (CRUD) and associated methods

A
  • Create (POST)
  • Read (GET)
  • Update (PUT/PATCH)
  • Delete (DELETE)
96
Q

HTTP Request headers

A

optional set of key value properties shared from client to server

97
Q

JSON document

A

JavaScript Object Notation. Document in a key value format frequently used for data transfer via REST APIs

98
Q

Ways to authenticate

A
  • basic authentication (username/password)

* secret token (e.g. oAuth)

99
Q

80-20 rule

A

20% of requests generate 80% of traffic

100
Q

Key generation service (KGS)

A
  • Service to generate random keys in advance of requests

* Removes risk of duplicates/key collisions

101
Q

Purging/DB cleanup considerations

A
  • storage is cheap, low cost of storing things for a long time
  • searching for expired data can be costly
  • active cleanup logic should consider app traffic
  • passive cleanup logic could recognize and delete expired data on request