System Design Flashcards

Question

Caching

Answer 1

* locality of reference principle: recently requested data is likely to be requested again * often implemented near front end to reduce downstream traffic

Answer 2

* cache placed directly on request layer, check if each request is in the cache before fetching from disk * can have multiple layers, e.g. node memory -> node disk (still faster than network)

Answer 3

* cache for sites serving large amounts of static media

Answer 4

* maintenance to keep cache consistent with database when data changes

Answer 5

* Data is written into the cache and DB simultaneously * Maximizes consistency, minimizes risk of loss * Higher latency as a result of double write operations

Answer 6

* Data is written directly to permanent storage, bypassing the cache * avoids flooding the cache with writes * increases chance of cache misses, which must then be read from back end with higher latency

Answer 7

* Data is written to cache alone * Write to permanent storage is done in intervals or chunks * Lowest latency, highest throughput for write-intensive apps * Risk of data loss due to crash since there is no cache backup

Answer 8

* FIFO - evicts oldest block first * LIFO - evicts newest block first * LRU - evicts least recently used items first * MRU - evicts most recently used items first * LFU - evicts least frequently used items first * RR - random replacement

Answer 9

* important later for scaling, partitioning, load balancing, and caching

Answer 10

* Examples of things to quantify: number of actions, amount of storage, expected network bandwidth usage

Answer 11

*Clients, load balancing, application servers, databases, file storage

Answer 12

* How will we partition data types between multiple databases? * How should we optimize data storage further (e.g. recency)? * How much and at what layer should we implement caching? * Which components need load balancing?

Answer 13

Horizontal add more servers in to your resource pool

Answer 14

Adding more power/capacity to an existing server

Answer 15

when request data is not found in the cache

Answer 16

* write-through * write-around * write-back

Answer 17

* for smaller systems, we can design for future transition to a CDN with a separate subdomain for static media

Answer 18

technique to break up a big database in to many smaller parts across multiple machines

Answer 19

Improves the manageability, performance, availability, and load balancing of an application

Answer 20

After a certain scale point, it is cheaper and easier to scale horizontally by adding machines than it is to grow vertically

Answer 21

* Horizontal partitioning * Vertical partitioning * Directory-based partitioning

Answer 22

Putting different rows in to different tables based upon range of a certain value

Answer 23

If the value chosen for partitioning isn't evenly distributed, then the scheme will lead to unbalanced servers

Answer 24

* divide data to store tables related to a specific feature in their own server * different types of data in different servers

Answer 25

If the app grows, we may need to further/horizontally partition a feature-specific database

Answer 26

* loosely coupled approach * create a *lookup service* which knows your current partitioning scheme * separates partitioning from the DB access code * functionality: query the directory server which holds the mapping between key and DB server

Answer 27

* because it is loosely coupled, we can add servers to the DB pool or change the partitioning scheme without impacting the application

Answer 28

* apply a hash to some attributes of the entity we are storing, yielding a partition number * problem: effectively fixes the total number of DB servers - workaround is to use consistent hashing

Answer 29

* each partition is assigned a list of values | * check each record against the list and store it in the relevant partition

Answer 30

* ensures uniform data distribution by rotating data assignment between partitions

Answer 31

* combination of any partitioning schemes to devise a new scheme * e.g. first applying list partitioning then hash based

Answer 32

* hash-based approach that handles adding/removing servers * hash the objects and the servers randomly to a unit circle * hash(o) mod(360) * assign each object to the next server in the circle in clockwise order

Answer 33

* in a system with n servers, place object o in server with id hash(o) mod n

Answer 34

If a server fails: only objects mapped to the failed server need to be reassigned to the next server clockwise If a server is added: only objects mapped to the new server need to be moved In either case, most objects maintain their prior assignments

Answer 35

Joins are often not feasible across partitions Common workaround is to denormalize the DB by adding redundant copies across multiple databases Denormalization downside is increased risk of data inconsistently

Answer 36

Most RDBMS do not support foreign keys across DBs on different servers Apps that require referential integrity across partitioned DBs often have to enforce it in application code

Answer 37

Rebalancing is difficult without incurring downtime since we have to move resources across partitions Directory-based partitioning can make rebalancing easier at the cost of increased system complexity and a new single point of failure on the lookup service

Answer 38

Create an index on particular DB tables to make it faster to search. Index can be created across one or more columns and includes a pointer to the full row

Answer 39

* Finding a small payload in a large dataset

Answer 40

* Can speed up retrieval * Increased write time * Increased storage requirements

Answer 41

* intermediate server between the requests from clients and the servers that handle those requests

Answer 42

* Filter, log, and transform requests | * Can serve requests from its cache to reduce downstream load

Answer 43

proxy server accessible to any internet user

Answer 44

Proxy within an internal network, not visible to the client. Can include load balancing, caching, security to protect internal servers from direct access

Answer 45

Proxy managed by a client to handle requests to an external server

Answer 46

Duplication of critical components or functions in a system to increase the reliability or improve performance

Answer 47

Sharing information to ensure consistency between redundant resources. Often used in DBMS where updates are written to a primary server which then passes it to the replica servers

Answer 48

* Structured with predefined schemas * Each row contains information about an entity * Each column contains a particular point of data

Answer 49

* Unstructured * easily distributed * dynamic schema

Answer 50

* Key-value stores * Document DBs * wide-column databases * graph databases

Answer 51

* Stores an array of key-value pairs | * Examples: Redis

Answer 52

* Data is stored in documents which are grouped in to collections * Each document can have a unique structure * Example: MongoDB

Answer 53

* Instead of tables, uses column families which are containers for rows * Don't need to know all the columns up front * Each row can have different numbers of columns * Best for analyzing largedatasets * Examples: Cassandra, HBase

Answer 54

* Store data graph relations * Data saved in the form of nodes and their properties, and lines/connections between nodes * Examples: Neo4J

Answer 55

SQL: data stored in tables where each row represents an entity NoSQL: variety of storage models

Answer 56

SQL: each record conforms to a fixed/predefined schema. Higher referential integrity. NoSQL: schemas are dynamic and changeable

Answer 57

SQL: uses SQL to manipulate and precisely retrieve subsets of data NoSQL: queries are focused on retrieving collections of full records

Answer 58

SQL: vertically scalable, difficult to scale horizontally without duplication or indexing NoSQL: highly horizontally scalable since referential integrity is less of a priority

Answer 59

ACID: Atomicity, Consistency, Isolation, Durability SQL: usually ACID compliant therefore more reliable NoSQL: sacrifices ACID compliance for performance and scalability

Answer 60

States that it is impossible for a distributed system to simultaneously provide more than 2/3 of the following: * consistency, availability, and partition tolerance

Answer 61

Every read retrieves the most recent write

Answer 62

Every request receives a response

Answer 63

The system continues to work despite message loss or partial failure. Data is sufficiently replicated across nodes and networks to handle intermittent/partial outages

Answer 64

1. Client opens a connection and requests data from server 2. The server calculates the response 3. The server sends back the response

Answer 65

Client repeatedly polls a server for data

Answer 66

Client requests information from the server, server holds the request open and waits until data is available

Answer 67

Persistent connection between a client and server over a single TCP connection * connection established via a WebSocket handshake * allows for real-time data transfer

Answer 68

* Clients establish a persistent, long term connection to the server * Server uses this connection to send data to a client

Answer 69

margin of extra storage above anticipated needs

Answer 70

REpresentational State Transfer

Answer 71

* Create (POST) * Read (GET) * Update (PUT/PATCH) * Delete (DELETE)

Answer 72

optional set of key value properties shared from client to server

Answer 73

JavaScript Object Notation. Document in a key value format frequently used for data transfer via REST APIs

Answer 74

* basic authentication (username/password) | * secret token (e.g. oAuth)

Answer 75

20% of requests generate 80% of traffic

Answer 76

* Service to generate random keys in advance of requests | * Removes risk of duplicates/key collisions

Answer 77

* storage is cheap, low cost of storing things for a long time * searching for expired data can be costly * active cleanup logic should consider app traffic * passive cleanup logic could recognize and delete expired data on request

System Design Flashcards

(101 cards)