- Open Proxy: accessible by any Internet user. - Reverse Proxy: retrieves resources on behalf of a client from one or more servers.

System Design Flashcards by Jon Hayden

What questions would you ask before starting your design?

Are we focusing on the backend only or are we developing the front-end too?
What are we storing (images, videos, text)?
Do we need to search?
What scale is expected from the system?
How much storage?
What network bandwidth is needed?
What are the expected APIs? Examples? inputs/outputs
What kind of Database will be used?

How well did you know this?

Not at all

Perfectly

What components would you use in a block diagram

Client
Load Balancer or Reverse Proxies
Application Server(s)
Database
File Storage

How well did you know this?

Not at all

Perfectly

What are some bottlenecks to consider when designing your architecture

Are there Single points of failure and how to mitigate it
Is there enough Data Replication?
Are there enough copies of services?
How to handle performance monitoring of services? Alerts?

How well did you know this?

Not at all

Perfectly

What are the key characteristics of Distributed Systems

Scalability
Reliability
Availability
Efficiency
Serviceability or manageability

How well did you know this?

Not at all

Perfectly

What are the benefits of load balancing

Faster uninterrupted service
Less downtime and higher throughput
Easier to handle incoming requests
Fewer failed or stressed components
predictive analytics

How well did you know this?

Not at all

Perfectly

How does the load balancer choose the backend server?

first ensure that the server they choose is actually responding appropriately to requests
use a pre-configured algorithm to select one from the set of healthy servers

How well did you know this?

Not at all

Perfectly

What are load balancing methods?

Least Connection Method
Least Response Time Method
Least Bandwidth Method
Round Robin Method
Weighted Round Robin Method
IP Hash: a hash of the IP address of the client is calculated to redirect the request to a server

How well did you know this?

Not at all

Perfectly

List the types of caches in system architecture

Application server: Placing a cache directly on a request layer node enables the local storage of response data
Content Distribution Network ( CDN): a kind of cache that comes into play for sites serving large amounts of static media

How well did you know this?

Not at all

Perfectly

What are the cache invalidation schemes

If the data is modified in the database, it should be invalidated in the cache; if not, this can cause inconsistent application behavior.

Write-through cache: data is written into the cache and the corresponding database at the same time
Write-around cache: data is written directly to permanent storage, bypassing the cache
Write-back cache: data is written to cache alone and completion is immediately confirmed to the client.

How well did you know this?

Not at all

Perfectly

What is database partitioning?

RDBMS (SQL)
- Partitioning is the database process where very large tables are divided into multiple smaller parts. By splitting a large table into smaller, individual tables, queries that access only a fraction of the data can run faster because there is less data to scan.

Vertical partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns.
Horizontal Partitioning (sharding) stores rows of a table in multiple database clusters.
Sharding makes it easy to generalize our data and allows for cluster computing (distributed computing). Sharding is needed if a data set is too large to be stored in a single DB. Most importantly, sharding allows a DB to scale in line with its data growth. It also reduces table size (index size more specifically) which improves search performance.

How well did you know this?

Not at all

Perfectly

What are some common problems of DB data partitioning?

Joins and Denormalization: Performing joins on a database which is running on one server is straightforward, but once a database is partitioned and spread across multiple machines it is often not feasible to perform joins that span database partitions.
Referential integrity: enforcing data integrity constraints such as foreign keys in a partitioned database can be extremely difficult.
Rebalancing. Reasons for rebalancing:
a) data distribution is not uniform
b) There is a lot of load on a partition

How well did you know this?

Not at all

Perfectly

What is the goal of creating an index on a particular table in a database?

make it faster to search through the table and find the row or rows that we want. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

How well did you know this?

Not at all

Perfectly

What is a forward Proxy

A proxy server is an intermediate server between the client and the back-end server. Clients connect to proxy servers to make a request for a service like a web page, file, connection, etc. In short, a proxy server is a piece of software or hardware that acts as an intermediary for requests from clients seeking resources from other servers.
- Used to bypass firewall restrictions

How well did you know this?

Not at all

Perfectly

What are Proxy Types

Open Proxy: accessible by any Internet user.

- Reverse Proxy: retrieves resources on behalf of a client from one or more servers.

How well did you know this?

Not at all

Perfectly

What are the two famous open proxy types

Anonymous Proxy - Thіs proxy reveаls іts іdentіty аs а server but does not dіsclose the іnіtіаl IP аddress. Though thіs proxy server cаn be dіscovered eаsіly іt cаn be benefіcіаl for some users аs іt hіdes their IP аddress.
Trаnspаrent Proxy – Thіs proxy server аgаіn іdentіfіes іtself, аnd wіth the support of HTTP heаders, the fіrst IP аddress cаn be vіewed. The mаіn benefіt of usіng thіs sort of server іs іts аbіlіty to cаche the websіtes.

How well did you know this?

Not at all

Perfectly

What is redundancy in a system design?

Study These Flashcards

Duplication of critical components or functions of a system with the intention of increasing the reliability of the system. Removes single points of failure.

What is Replication in a database?

Study These Flashcards

Means sharing information to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility. Used in DBMS with master slave relationship

Data Replication is the process of storing data in more than one site or node. It is useful in improving the availability of data. It is simply copying data from a database from one server to another server so that all the users can share the same data without any inconsistency. The result is a distributed database in which users can access data relevant to their tasks without interfering with the work of others.

What is SQL and famous types of SQL databases?

Study These Flashcards

Relational databases that store data in rows and columns. (MySQL, Oracle, MS SQL Server, Postgres)

What are types of NoSQL?

Study These Flashcards

Key-Value Stores: Data is stored in an array of key-value pairs.
Document Databases: In these databases, data is stored in documents (instead of rows and columns in a table) and these documents are grouped together in collections
Wide-Column Databases: Instead of ‘tables,’ in columnar databases we have column families, which are containers for rows.
Graph Databases: These databases are used to store data whose relations are best represented in a graph.

What are the differences between SQL and NoSQL?

Study These Flashcards

Storage: SQL stores data in tables with a row defining a entity and each column representing a data point. NoSQL has different storage models
Schema: In SQL, each record conforms to a fixed schema, meaning the columns must be decided and chosen before data entry and each row must have data for each column. In NoSQL, schemas are dynamic.
Querying: SQL uses structured query language. NoSQL queries are focused on a collection of documents.
Scalability: SQL is vertically scalable by increasing horsepower (cpu,memory,etc) and can get expensive. NoSQL is horizontally scalable making it easy to add more servers to handle traffic. Any cheap commodity hardware can host NoSQL DBs.
Reliability or ACID(Atomicity, Consistency, Isolation,Durability). SQL has best reliability. NoSQL sacrifice ACID for performance and scalability.

What reasons to choose SQL?

Study These Flashcards

The need for ACID ((Atomicity, Consistency, Isolation,Druability).
Finance, ecommerce
Data is structured and unchanging.
Slow growth business

What reasons to choose NoSQL

Study These Flashcards

Storing large amount of data with no structure.
Cloud based storage excellent use since it can use commodity and affordable hardware on-site or in the cloud
Easily scales across multiple data centers
Rapid development and it doesn’t need to be prepped ahead of time.

What is CAP Theorem

Study These Flashcards

CAP theorem states that it is impossible for a distributed software system to simultaneously provide more than two out of three of the following guarantees:

Consistency: All nodes see the same data at the same time. Consistency is achieved by updating several nodes before allowing further reads.
Availability: Every request gets a response on success/failure. Availability is achieved by replicating the data across different servers.
Partition tolerance: The system continues to work despite message loss or partial failure. A system that is partition-tolerant can sustain any amount of network failure that doesn’t result in a failure of the entire network. Data is sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages.
Aim to always have 2 of the 3 properties

What is Consistent Hashing?

Study These Flashcards

Strategy that allows us to distribute data across a cluster in such a way that will minimize reorganization when nodes are added or removed.

How does Consistent Hashing work?

- Hash servers to integers in the range 1. Move clockwise on the ring until finding the first cache it encounters 2. That cache is the one that contains the key. key1 maps to cache A; key2 maps to cache C. - If you add a server, keys residing in the previous server will be split - If you remove a server (failure), All keys will be mapped to next server - Load may not be balanced unless "virtual replicas" are used to point to multiple points in the ring

What is the difference between Ajax Polling and Long Polling and WebSockets?

- Regular (AJAX) polling involves the client making a request to the server. The server responds to the request even if the respond is empty. - With Long Polling, the server hangs on to the request and waits until it has data to respond with. A timeout can cancel this waiting period. - WebSocket provides Full duplex communication channels over a single TCP connection. It provides a persistent connection between a client and a server that both parties can use to start sending data at any time.

What is WebSockets?

- A full duplex communication channel over a single TCP connection. - Persistent connection between client and server. - Real time data transfer

What is SSE

- Server-Sent Events - Under SSEs the client establishes a persistent and long-term connection with the server. The server uses this connection to send data to a client. If the client wants to send data to the server, it would require the use of another technology/protocol to do so. - Best when we need real-time traffic from the server

Why partition data in a database?

- Improve scalability - Improve performance - Improve security - Provide flexibility - Improve availability

System Design Flashcards

(29 cards)