System Design Concepts Flashcards

Question

What is LIFO Cache Eviction?

Answer 1

Last In, First Out The cache evicts the block accessed most recently without any regard to how often or how many times it was accessed before

Answer 2

Least Recently Used The cache discards the least recently used items first

Answer 3

Most Recently Used The cache discards the most recently used items first

Answer 4

Least Frequently Used Counts how often an item is needed. Those that are used least often are discarded first

Answer 5

Randomly selects a candidate item and discards it to make space when necessary

Answer 6

The process of splitting up a database or table across multiple machines to improve the manageability, performance, availability and load balancing of an application Claim is that after a certain scale point, it is cheaper and more feasible to scale horizontally by adding more machines than to grow it vertically by adding beefier servers

Answer 7

Also called data sharding Where you put different rows of the same table onto different machines The key problem with this approach is that if the value whose range is used for partitioning isn't chosen carefully, then the partitioning scheme will lead to unbalanced servers (ex: messages for IBM sharded by org ID)

Answer 8

Divide our data to store tables related to a specific feature on their own server -- for example, storing user profile info on one DB, friends on another, and photos on a third This is straight forward to implement and has a low impact on the application The main problem is that if the application experiences additional growth, then it may be necessary to further partition a feature specific DB across various servers

Answer 9

A loosely coupled approach -- create a lookup service which knows your current partitioning scheme and abstracts is away from the database access code To find out where a particular data entry resides, we query directory server that holds the mapping between tuple key to db server Means we can perform tasks like adding server or changing partitioning scheme without impacting the application

Answer 10

Apply a hash function to key attribute of the entity we are storing (ex: 100 servers and `id` % 100 yields the server number) Ensures a uniform distribution of data among servers Fundamental problem is that it effectively fixes total number of database servers since adding new servers means changing hashing function -- this can be solved for by using consistent hashing

Answer 11

Each partition is assigned a list of values, so whenever we want to insert a new record, we will see which partition contains our key and store it there

Answer 12

Very simple strategy that ensures uniform data distribution With "n" partitions, the "i" tuple is assigned to partition (i % n)

Answer 13

Combing other partitioning schemes to devise a new scheme (ex: first applying list partitioning scheme and then a hash based partition) Consistent hashing could be considered a composite of hash + list partitioning where the hash reduces the key space to a size that can be listed

Answer 14

- Joins and Denormalization: not feasible to perform joins that span partitions (work around is to denormalize - but comes with risk of data inconsistency) - Referential Integrity: normally cannot enforce data integrity constraints (such as foreign keys) across partitions. Must be enforced in the application code - Rebalancing: many reasons we need to change our partitioning scheme which means moving data. Dictionary based partitioning helps but adds complexity to system and creates a single point of failure

Answer 15

A proxy server is a piece of software or hardware that acts an intermediary from clients seeking resources from our servers Proxies are used to filter requests, log requests or sometimes transform requests (adding/removing headers, encrypting/decrypting, compressing)

Answer 16

A proxy server that is accessible by any internet user Types: - Anonymous proxy: reveals its identity as a server but does not disclose IP address - Transparent proxy: identifies itself and with the support of HTTP headers, the 1st IP address can be viewed

Answer 17

Retrieves resources on behalf of a client from one or more servers These resources are then returned to the client, appearing as if they originated from the proxy server itself

Answer 18

The duplication of critical components or functions of a system within the intention of increasing the reliability of a system, usually in the form of a back-up or fail safe Plays a key role in removing the single points of failure in a system and provides backups if needed in a crisis

Answer 19

Replication means sharing information to ensure consistency between redundant resources to improve reliability, fault-tolerance of accessibility Replication is widely used in many DB management system, generally with a primary/copy relationship, where the primary gets all the updates and then ripples through the copies, which output messages indicating successful updates

Answer 20

Relational database that stores data in rows and columns Each row contains all the information about one entity and each column contains all the separate data points You should use a SQL DB if you: - Need to ensure ACID compliance -- reducing anomalies and protecting the integrity of your database by prescribing exactly how transactions interact with the database - Your data is structured and unchanging - if your business is not experiencing massive growth that would require more servers and if you're only working with data that is consistent

Answer 21

Non-relational databases are unstructured, distributed and have a dynamic schema, like file folders You should use a NoSQL DB if you: - Are storing large volumes of data that has little to no structure - Want to make the most of cloud computing and storage - Are rapidly developing. NoSQL doesn't need to be prepped ahead of time, and makes it easy if you are making quick iterations which require frequent updates to the data structure

Answer 22

SQL: stores data in tables where each row represents an entity and each column represents a data point about that entry NoSQL: has different storage models. Main ones are key-value, document, graph and columnar

Answer 23

SQL: each record conforms to a fixed schema, meaning columns must be decided and chosen before data entry and each row must have data for each column. Schema can be altered but it includes modifying whole database and going offline NoSQL: schemas are dynamic. Columns can be added on the fly and each "row" equivalent doesn't have to contain data for each "column"

Answer 24

SQL: In most situations SQL databases are vertically scalable in terms of hardware -- which can get expensive. You can also horizontally scale, but that is also expensive both in terms of cost and complexity NoSQL: Horizontally scalable, meaning we can add more servers easily. A lot of NoSQL technologies also distribute data across servers automatically

Answer 25

SQL: the vast majority of SQL databases are ACID compliant, so when it comes to data reliability and safe guarantee of performing transactions, SQL databases are a better bet NoSQL: most NoSQL solutions sacrifice ACID compliance for performance and scalability

Answer 26

States it is impossible for a distributed software system to simultaneously provide more than 2 of the following: - Consistency: all nodes see the same data at the same time (achieved by updating several nodes before reads) - Availability: every request gets a response on success/failure (achieved to replicating data across different servers) - Partition Tolerance: system continues to work despite message loss or partial failure (achieved by data being sufficiently replicated across combos of nodes and networks)

Answer 27

A distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table, by assigning them a partition on an abstract circle, or hash ring Allows adding new hosts without migrating entire partitioning scheme

Answer 28

1. Client opens a connection and requests data from server 2. Server calculates the response 3. Server sends the response back to the client

Answer 29

1. The client repeatedly polls (or requests) a server for data 2. The client makes request and waits for server to respond -- if no data is available, empty response is returned 3. This repeats at regular intervals The downside to this sequence: - Client has to keep asking the server for new data - As a result, a lot of responses are empty, creating HTTP overhead

Answer 30

A variation of the typical polling technique that allows the server to push information to a client whenever the data is available Same as normal polling, but expectation that the server may not respond immediately ("hanging GET") Once delivered, client will immediately re-request info from the server so it always has available request

Answer 31

Provide a persistent connection between a client and a server that both parties can use to start sending data at any time Websocket protocol enables communication between a client and a server with lower overheads, facilitating real time data transfer from and to the server

Answer 32

Under SSEs the client establishes a persistent and long term connection with the server The server uses this connection to send data to a client -- but if the client wants to send data to the server, it would require the use of another protocol/technique to do so Best when we need real-time traffic from the server to the client

Answer 33

Latency is commonly understood to be the "round trip" of the network request latency is the inverse of speed, you want higher speeds and you want lower latency

Answer 34

This can be understood as the maximum capacity of the machine or system A system is only as fast as its slowest bottleneck You can increase throughput by buying more hardware (horizontal scaling) or increasing the capacity and performance of your existing hardware (vertical scaling) Think about ways to scale the throughput of a given system including by splitting up load, and distributing them across other resources, etc

System Design Concepts Flashcards

(58 cards)