System Design Flashcards

1
Q

What questions would you ask before starting your design?

A
  • Are we focusing on the backend only or are we developing the front-end too?
  • What are we storing (images, videos, text)?
  • Do we need to search?
  • What scale is expected from the system?
  • How much storage?
  • What network bandwidth is needed?
  • What are the expected APIs? Examples? inputs/outputs
  • What kind of Database will be used?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What components would you use in a block diagram

A
  • Client
  • Load Balancer or Reverse Proxies
  • Application Server(s)
  • Database
  • File Storage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some bottlenecks to consider when designing your architecture

A
  • Are there Single points of failure and how to mitigate it
  • Is there enough Data Replication?
  • Are there enough copies of services?
  • How to handle performance monitoring of services? Alerts?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the key characteristics of Distributed Systems

A
  • Scalability
  • Reliability
  • Availability
  • Efficiency
  • Serviceability or manageability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the benefits of load balancing

A
  • Faster uninterrupted service
  • Less downtime and higher throughput
  • Easier to handle incoming requests
  • Fewer failed or stressed components
  • predictive analytics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does the load balancer choose the backend server?

A
  • first ensure that the server they choose is actually responding appropriately to requests
  • use a pre-configured algorithm to select one from the set of healthy servers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are load balancing methods?

A
  • Least Connection Method
  • Least Response Time Method
  • Least Bandwidth Method
  • Round Robin Method
  • Weighted Round Robin Method
  • IP Hash: a hash of the IP address of the client is calculated to redirect the request to a server
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

List the types of caches in system architecture

A
  • Application server: Placing a cache directly on a request layer node enables the local storage of response data
  • Content Distribution Network ( CDN): a kind of cache that comes into play for sites serving large amounts of static media
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the cache invalidation schemes

A

If the data is modified in the database, it should be invalidated in the cache; if not, this can cause inconsistent application behavior.

  • Write-through cache: data is written into the cache and the corresponding database at the same time
  • Write-around cache: data is written directly to permanent storage, bypassing the cache
  • Write-back cache: data is written to cache alone and completion is immediately confirmed to the client.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is database partitioning?

A

RDBMS (SQL)
- Partitioning is the database process where very large tables are divided into multiple smaller parts. By splitting a large table into smaller, individual tables, queries that access only a fraction of the data can run faster because there is less data to scan.

  • Vertical partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns.
  • Horizontal Partitioning (sharding) stores rows of a table in multiple database clusters.
  • Sharding makes it easy to generalize our data and allows for cluster computing (distributed computing). Sharding is needed if a data set is too large to be stored in a single DB. Most importantly, sharding allows a DB to scale in line with its data growth. It also reduces table size (index size more specifically) which improves search performance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some common problems of DB data partitioning?

A
  • Joins and Denormalization: Performing joins on a database which is running on one server is straightforward, but once a database is partitioned and spread across multiple machines it is often not feasible to perform joins that span database partitions.
  • Referential integrity: enforcing data integrity constraints such as foreign keys in a partitioned database can be extremely difficult.
  • Rebalancing. Reasons for rebalancing:
    a) data distribution is not uniform
    b) There is a lot of load on a partition
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the goal of creating an index on a particular table in a database?

A

make it faster to search through the table and find the row or rows that we want. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a forward Proxy

A

A proxy server is an intermediate server between the client and the back-end server. Clients connect to proxy servers to make a request for a service like a web page, file, connection, etc. In short, a proxy server is a piece of software or hardware that acts as an intermediary for requests from clients seeking resources from other servers.
- Used to bypass firewall restrictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are Proxy Types

A
  • Open Proxy: accessible by any Internet user.

- Reverse Proxy: retrieves resources on behalf of a client from one or more servers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the two famous open proxy types

A

Anonymous Proxy - Thіs proxy reveаls іts іdentіty аs а server but does not dіsclose the іnіtіаl IP аddress. Though thіs proxy server cаn be dіscovered eаsіly іt cаn be benefіcіаl for some users аs іt hіdes their IP аddress.
Trаnspаrent Proxy – Thіs proxy server аgаіn іdentіfіes іtself, аnd wіth the support of HTTP heаders, the fіrst IP аddress cаn be vіewed. The mаіn benefіt of usіng thіs sort of server іs іts аbіlіty to cаche the websіtes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is redundancy in a system design?

A

Duplication of critical components or functions of a system with the intention of increasing the reliability of the system. Removes single points of failure.

17
Q

What is Replication in a database?

A

Means sharing information to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility. Used in DBMS with master slave relationship

Data Replication is the process of storing data in more than one site or node. It is useful in improving the availability of data. It is simply copying data from a database from one server to another server so that all the users can share the same data without any inconsistency. The result is a distributed database in which users can access data relevant to their tasks without interfering with the work of others.

18
Q

What is SQL and famous types of SQL databases?

A

Relational databases that store data in rows and columns. (MySQL, Oracle, MS SQL Server, Postgres)

19
Q

What are types of NoSQL?

A
  • Key-Value Stores: Data is stored in an array of key-value pairs.
  • Document Databases: In these databases, data is stored in documents (instead of rows and columns in a table) and these documents are grouped together in collections
  • Wide-Column Databases: Instead of ‘tables,’ in columnar databases we have column families, which are containers for rows.
  • Graph Databases: These databases are used to store data whose relations are best represented in a graph.
20
Q

What are the differences between SQL and NoSQL?

A
  • Storage: SQL stores data in tables with a row defining a entity and each column representing a data point. NoSQL has different storage models
  • Schema: In SQL, each record conforms to a fixed schema, meaning the columns must be decided and chosen before data entry and each row must have data for each column. In NoSQL, schemas are dynamic.
  • Querying: SQL uses structured query language. NoSQL queries are focused on a collection of documents.
  • Scalability: SQL is vertically scalable by increasing horsepower (cpu,memory,etc) and can get expensive. NoSQL is horizontally scalable making it easy to add more servers to handle traffic. Any cheap commodity hardware can host NoSQL DBs.
  • Reliability or ACID(Atomicity, Consistency, Isolation,Durability). SQL has best reliability. NoSQL sacrifice ACID for performance and scalability.
21
Q

What reasons to choose SQL?

A
  • The need for ACID ((Atomicity, Consistency, Isolation,Druability).
  • Finance, ecommerce
  • Data is structured and unchanging.
  • Slow growth business
22
Q

What reasons to choose NoSQL

A
  • Storing large amount of data with no structure.
  • Cloud based storage excellent use since it can use commodity and affordable hardware on-site or in the cloud
  • Easily scales across multiple data centers
  • Rapid development and it doesn’t need to be prepped ahead of time.
23
Q

What is CAP Theorem

A

CAP theorem states that it is impossible for a distributed software system to simultaneously provide more than two out of three of the following guarantees:

  • Consistency: All nodes see the same data at the same time. Consistency is achieved by updating several nodes before allowing further reads.
  • Availability: Every request gets a response on success/failure. Availability is achieved by replicating the data across different servers.
  • Partition tolerance: The system continues to work despite message loss or partial failure. A system that is partition-tolerant can sustain any amount of network failure that doesn’t result in a failure of the entire network. Data is sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages.
  • Aim to always have 2 of the 3 properties
24
Q

What is Consistent Hashing?

A
  • Strategy that allows us to distribute data across a cluster in such a way that will minimize reorganization when nodes are added or removed.
25
Q

How does Consistent Hashing work?

A
  • Hash servers to integers in the range
    1. Move clockwise on the ring until finding the first cache it encounters
    2. That cache is the one that contains the key. key1 maps to cache A; key2 maps to cache C.
  • If you add a server, keys residing in the previous server will be split
  • If you remove a server (failure), All keys will be mapped to next server
  • Load may not be balanced unless “virtual replicas” are used to point to multiple points in the ring
26
Q

What is the difference between Ajax Polling and Long Polling and WebSockets?

A
  • Regular (AJAX) polling involves the client making a request to the server. The server responds to the request even if the respond is empty.
  • With Long Polling, the server hangs on to the request and waits until it has data to respond with. A timeout can cancel this waiting period.
  • WebSocket provides Full duplex communication channels over a single TCP connection. It provides a persistent connection between a client and a server that both parties can use to start sending data at any time.
27
Q

What is WebSockets?

A
  • A full duplex communication channel over a single TCP connection.
  • Persistent connection between client and server.
  • Real time data transfer
28
Q

What is SSE

A
  • Server-Sent Events
  • Under SSEs the client establishes a persistent and long-term connection with the server. The server uses this connection to send data to a client. If the client wants to send data to the server, it would require the use of another technology/protocol to do so.
  • Best when we need real-time traffic from the server
29
Q

Why partition data in a database?

A
  • Improve scalability
  • Improve performance
  • Improve security
  • Provide flexibility
  • Improve availability