Additional System Design Flashcards

Question

What is DevSecOps?

Answer 1

DevSecOps emerged as a natural evolution of DevOps practices with a focus on integrating security into the software development and deployment process. The term "DevSecOps" represents the convergence of Development (Dev), Security (Sec), and Operations (Ops) practices, emphasizing the importance of security throughout the software development lifecycle. The diagram below shows the important concepts in DevSecOps. 1 . Automated Security Checks 2 . Continuous Monitoring 3 . CI/CD Automation 4 . Infrastructure as Code (IaC) 5 . Container Security 6 . Secret Management 7 . Threat Modeling 8. Quality Assurance (QA) Integration 9 . Collaboration and Communication 10 . Vulnerability Management

Answer 2

90% of the world’s data was created in the last two years and this growth will only get faster. However, the biggest challenge is to leverage this data in real-time. Constant data changes make databases, data lakes, and data warehouses out of sync. CDC or Change Data Capture can help you overcome this challenge. CDC identifies and captures changes made to the data in a database, allowing you to replicate and sync data across multiple systems. So, how does Change Data Capture work? Here's a step-by-step breakdown: 1 - Data Modification: A change is made to the data in the source database. It could be an insert, update, or delete operation on a table. 2 - Change Capture: A CDC tool monitors the database transaction logs to capture the modifications. It uses the source connector to connect to the database and read the logs. 3 - Change Processing: The captured changes are processed and transformed into a format suitable for the downstream systems. 4 - Change Propagation: The processed changes are published to a message queue and propagated to the target systems, such as data warehouses, analytics platforms, distributed caches like Redis, and so on. 5 - Real-Time Integration: The CDC tool uses its sink connector to consume the log and update the target systems. The changes are received in real time, allowing for conflict-free data analysis and decision-making. Users only need to take care of step 1 while all other steps are transparent. A popular CDC solution uses Debezium with Kafka Connect to stream data changes from the source to target systems using Kafka as the broker. Debezium has connectors for most databases such as MySQL, PostgreSQL, Oracle, etc.

Answer 3

Elasticsearch is widely used for its powerful and versatile search capabilities. The diagram below shows the top 6 use cases: 🔹 Full-Text Search Elasticsearch excels in full-text search scenarios due to its robust, scalable, and fast search capabilities. It allows users to perform complex queries with near real-time responses. 🔹 Real-Time Analytics Elasticsearch's ability to perform analytics in real-time makes it suitable for dashboards that track live data, such as user activity, transactions, or sensor outputs. 🔹 Machine Learning With the addition of the machine learning feature in X-Pack, Elasticsearch can automatically detect anomalies, patterns, and trends in the data. 🔹 Geo-Data Applications Elasticsearch supports geo-data through geospatial indexing and searching capabilities. This is useful for applications that need to manage and visualize geographical information, such as mapping and location-based services. 🔹 Log and Event Data Analysis Organizations use Elasticsearch to aggregate, monitor, and analyze logs and event data from various sources. It's a key component of the ELK stack (Elasticsearch, Logstash, Kibana), which is popular for managing system and application logs to identify issues and monitor system health. 🔹 Security Information and Event Management (SIEM) Elasticsearch can be used as a tool for SIEM, helping organizations to analyze security events in real time.

Answer 4

🔹 User interaction and command initiation By double-clicking a program, a user is instructing the operating system to launch an application via the graphical user interface. 🔹 Program Preloading Once the execution request has been initiated, the operating system first retrieves the program's executable file. The operating system locates this file through the file system and loads it into memory in preparation for execution. 🔹 Dependency resolution and loading Most modern applications rely on a number of shared libraries, such as dynamic link libraries (DLLs). 🔹 Allocating memory space The operating system is responsible for allocating space in memory. 🔹 Initializing the Runtime Environment After allocating memory, the operating system and execution environment (e.g., Java's JVM or the .NET Framework) will initialize various resources needed to run the program. 🔹 System Calls and Resource Management The entry point of a program (usually a function named `main`) is called to begin execution of the code written by the programmer. 🔹 Von Neumann Architecture In the Von Neumann architecture, the CPU executes instructions stored in memory. 🔹 Program termination Eventually, when the program has completed its task, or the user actively terminates the application, the program will begin a cleanup phase. This includes closing open file descriptors, freeing up network resources, and returning memory to the system.

Answer 5

This post is based on research from many Netflix engineering blogs and open-source projects. If you come across any inaccuracies, please feel free to inform us. Mobile and web: Netflix has adopted Swift and Kotlin to build native mobile apps. For its web application, it uses React. Frontend/server communication: GraphQL. Backend services: Netflix relies on ZUUL, Eureka, the Spring Boot framework, and other technologies. Databases: Netflix utilizes EV cache, Cassandra, CockroachDB, and other databases. Messaging/streaming: Netflix employs Apache Kafka and Fink for messaging and streaming purposes. Video storage: Netflix uses S3 and Open Connect for video storage. Data processing: Netflix utilizes Flink and Spark for data processing, which is then visualized using Tableau. Redshift is used for processing structured data warehouse information. CI/CD: Netflix employs various tools such as JIRA, Confluence, PagerDuty, Jenkins, Gradle, Chaos Monkey, Spinnaker, Altas, and more for CI/CD processes.

Answer 6

How do services communicate with each other? The diagram below shows 6 cloud messaging patterns. 🔹 Asynchronous Request-Reply. This pattern aims at providing determinism for long-running backend tasks. It decouples backend processing from frontend clients. In the diagram below, the client makes a synchronous call to the API, triggering a long-running operation on the backend. The API returns an HTTP 202 (Accepted) status code, acknowledging that the request has been received for processing. 🔹 Publisher-Subscriber This pattern targets decoupling senders from consumers, and avoiding blocking the sender to wait for a response. 🔹 Claim Check This pattern solves the transmision of large messages. It stores the whole message payload into a database and transmits only the reference to the message, which will be used later to retrieve the payload from the database. 🔹 Priority Queue This pattern prioritizes requests sent to services so that requests with a higher priority are received and processed more quickly than those with a lower priority. 🔹 Saga Saga is used to manage data consistency across multiple services in distributed systems, especially in microservices architectures where each service manages its own database. The saga pattern addresses the challenge of maintaining data consistency without relying on distributed transactions, which are difficult to scale and can negatively impact system performance. 🔹 Competing Consumers This pattern enables multiple concurrent consumers to process messages received on the same messaging channel. There is no need to configure complex coordination between the consumers. However, this pattern cannot guarantee message ordering.

Answer 7

This information is based on research from many Reddit engineering blogs. But since architecture is ever-evolving, things might have changed in some aspects. The main points of Reddit’s architecture are as follows: 1 - Reddit uses a Content Delivery Network (CDN) from Fastly as a front for the application 2 - Reddit started using jQuery in early 2009. Later on, they started using Typescript and have now moved to modern Node.js frameworks. Over the years, Reddit has also built mobile apps for Android and iOS. 3 - Within the application stack, the load balancer sits in front and routes incoming requests to the appropriate services. 4 - Reddit started as a Python-based monolithic application but has since started moving to microservices built using Go. 5 - Reddit heavily uses GraphQL for its API layer. In early 2021, they started moving to GraphQL Federation, which is a way to combine multiple smaller GraphQL APIs known as Domain Graph Services (DGS). In 2022, the GraphQL team at Reddit added several new Go subgraphs for core Reddit entities thereby splitting the GraphQL monolith. 6 - From a data storage point of view, Reddit relies on Postgres for its core data model. To reduce the load on the database, they use memcached in front of Postgres. Also, they use Cassandra quite heavily for new features mainly because of its resiliency and availability properties. 7 - To support data replication and maintain cache consistency, Reddit uses Debezium to run a Change Data Capture process. 8 - Expensive operations such as a user voting or submitting a link are deferred to an async job queue via RabbitMQ and processed by job workers. For content safety checks and moderation, they use Kafka to transfer data in real-time to run rules over them. 9 - Reddit uses AWS and Kubernetes as the hosting platform for its various apps and internal services. 10 - For deployment and infrastructure, they use Spinnaker, Drone CI, and Terraform. Over to you: what other aspects do you know about Reddit’s architecture?

Answer 8

🔹 Peer-to-Peer The Peer-to-Peer pattern involves direct communication between two components without the need for a central coordinator. 🔹 API Gateway An API Gateway acts as a single entry point for all client requests to the backend services of an application. 🔹 Pub-Sub The Pub-Sub pattern decouples the producers of messages (publishers) from the consumers of messages (subscribers) through a message broker. 🔹 Request-Response This is one of the most fundamental integration patterns, where a client sends a request to a server and waits for a response 🔹 Event Sourcing Event Sourcing involves storing the state changes of an application as a sequence of events. 🔹 ETL ETL is a data integration pattern used to gather data from multiple sources, transform it into a structured format, and load it into a destination database. 🔹 Batching Batching involves accumulating data over a period or until a certain threshold is met before processing it as a single group. 🔹 Streaming Processing Streaming Processing allows for the continuous ingestion, processing, and analysis of data streams in real-time. 🔹 Orchestration Orchestration involves a central coordinator (an orchestrator) managing the interactions between distributed components or services to achieve a workflow or business process.

Answer 9

Great architecture isn’t just about solving today’s problems—it’s about preparing for growth and change. Here are 7 foundational principles with actionable best practices to guide your designs: 1) Scalable – Design systems to handle increased load via horizontal scaling, auto-scaling, and distributed state management. 2) Highly available & resilient – Ensure uptime and fast recovery using failover strategies, redundancy, and data synchronization. 3) Performant – Optimize for low latency and high throughput using async processing, caching, and monitoring worst-case latencies (e.g., p99). 4) Secure – Bake in security from the start: use encryption, RBAC, OAuth2/JWT, and consider Zero Trust Architecture. 5) Loosely coupled – Enable flexibility and fault tolerance with modular design and event-driven messaging systems like Kafka or RabbitMQ. 6) Extensible – Support future growth by following the Open-Closed Principle and building backward-compatible APIs. 7) Reusable – Boost development speed with composable components, shared libraries, and domain-driven design (DDD). Applying these principles leads to systems that are scalable, maintainable, and ready for whatever comes next.

Answer 10

Stateless design is a powerful model that has led to the development of simple yet highly scalable and efficient applications. “State” refers to stored information that systems use to process requests. This information can change over time as users interact with the application or as system events occur. What are Stateful Applications? With stateful applications, client data such as user ID, session information, configurations, and preferences are stored to help process requests for a given user. Depending on the functionality and requirements of the application, additional data may be saved, such as shopping cart information for an online store or transaction history for a FinTech service. A stateful design allows applications to provide a personalized experience to their users while removing the need to share data across multiple requests. For this reason, it is a popular approach for applications with user preferences such as streaming services and online games. systems use to process requests. This information can change over time as users interact with the application or as system events occur. Use Cases for Stateless Design Stateless design has risen in popularity due to its alignment with trends in modern computing such as serverless architecture and microservices. One of the key principles behind microservices is that each service is stateless. This allows microservices to scale independently and ensures resource consumption stays efficient. Serverless computing follows the same concept — each function is invoked independently. Even applications that require session management can benefit from implementing a stateless design in components of their system. For example, most RESTful APIs are stateless where each API call contains all necessary information. CDNs also follow a stateless design so that every request can be fulfilled by any server in the network without needing to sync session data between all servers or query a single session management store. Disadvantages of Stateless Design The size of requests can be considerably larger in stateless design. Moreover, sending data across multiple requests can introduce significant inefficiencies that are far greater than the alternative of managing and querying this data from a central storage system. It is important to note that stateless design should only be implemented for use cases that are truly stateless. Although stateful design has its share of disadvantages, workarounds can add more complexity and fragility. Final Thoughts Most applications pick a hybrid approach between stateful and stateless design, depending on the needs and constraints of each component. The key to a well-designed system is balance. It should be scalable, simple, and fast without sacrificing functionality.

Answer 11

AI isn’t replacing engineers—instead it’s making us faster. AI coding tools are making developers faster, especially during the coding stage. But most tools stop there. GitLab Duo goes further—bringing AI to every stage of the SDLC, from design to deployment: System Design & Architecture: AI suggests improvements and flags vulnerabilities early. Code Development: Real-time autocomplete, refactoring tips, and code explanations. Testing & QA: Auto-generates tests and identifies security risks. Deployment & Release: Enhances CI/CD with smart summaries and troubleshooting. Security & Compliance: Self-hosted option ensures enterprise-grade control over data.

Answer 12

What do Uber, Amazon, airlines, and Facebook all have in common? Dynamic pricing. It’s a major function of their business models. And it’s not just them either. Dynamic pricing is a key strategy for a lot of companies. Maybe even the company you’re currently working for. Behind dynamic pricing sits a system that leverages software engineering, data science, and business strategy to help companies achieve business outcomes. Dynamic Pricing in Practice Dynamic pricing is a key strategy for businesses to increase profitability. It enables companies to maximize revenue during periods of high demand by increasing pricing while demand is high. In times of low demand, dynamic pricing allows companies to lower prices to stimulate demand and maintain a consistent revenue stream. This ensures that they generate sales even during periods of lower demand. It also helps businesses conduct competitive pricing by allowing them to make adjustments based on competitors' strategies. And for businesses with physical products, dynamic pricing is essential to inventory management. It helps in optimizing stock levels, consequently minimizing potential losses. Depending on the business’s needs and system, the above changes in price are often in real-time or near real-time. I did say that Uber, Amazon, airlines, and Facebook all use dynamic pricing. Here’s how: Uber adjusts fares based on rider demand and driver availability, leading to higher prices during peak times. Amazon changes product prices frequently, considering competition, demand, and availability. Airlines vary ticket prices based on booking time, demand, and seat availability. Facebook uses dynamic pricing for its advertising slots, with costs fluctuating based on demand, ad placement, and competition. And it’s not just these companies. Through these examples, you can see how dynamic pricing is actually used by a lot of companies. By understanding the business context, you’re now better equipped as an engineer to strategize and implement the technical solution, helping achieve business outcomes. Understanding the Architecture of Dynamic Pricing Systems Dynamic pricing carefully balances market demand, competitor pricing and inventory levels while adjusting prices in real or near-real time. Beyond evaluating these factors, engineers must also build systems that can process large datasets quickly and automate scalable pricing decisions. Dynamic pricing generally follows the ELT process (extract, transform, load). It begins with data collection and analysis, capturing real-time sales, customer behavior, inventory status and competitor prices. Data pipelines must be created to handle this influx from diverse sources, ensuring both the precision and speed necessary for the pricing algorithms. After collection, efficient data storage is necessary for quick querying and analysis. To improve real-time analytics performance, data warehouses should be optimized for analytical queries and scalable to accommodate data growth, as well as caching mechanisms and data indexing. Moving from storage to decision-making, well-designed algorithms are the backbone of any efficient dynamic pricing strategy. Many approaches can be taken, here are some of the most notable: Rule-based systems This is the simplest approach where price adjustments are based on predefined criteria. Time-series forecasting This approach analyzes historical data to set future pricing. ARIMA (AutoRegressive Integrated Moving Average) and Prophet (a forecasting tool developed by Facebook) are commonly used for time-series forecasting. The challenge here is to accurately model and forecast pricing trends, taking into account seasonal variations and market shifts. Machine learning models Regression models, decision trees, and neural networks are often used to determine pricing based on historical information. Multi-armed Bandit Algorithms When you have multiple pricing strategies available, multi-armed bandit algorithms can be used to determine which one will provide the most revenue. Challenges and Considerations Crafting dynamic pricing systems comes with its own set of challenges. Key among these is the need for scalability, ensuring the system can manage large data volumes, particularly during peak times. Preserving data privacy and security is equally important while requiring systems to comply with legal and ethical standards. Additionally, the design must consider how swift changes in pricing could influence customer perceptions and the overall brand reputation. Each of these factors plays a crucial role in the successful implementation and operation of a dynamic pricing strategy. Wrapping Up Twenty years ago, dynamic pricing systems were rarely seen. Nowadays, a lot of companies have them. The growing adoption and continuous development of dynamic pricing systems highlight the growing role of engineering in modern business.

Answer 13

🔹 LRU (Least Recently Used) LRU eviction strategy removes the least recently accessed items first. This approach is based on the principle that items accessed recently are more likely to be accessed again in the near future. 🔹 MRU (Most Recently Used) Contrary to LRU, the MRU algorithm removes the most recently used items first. This strategy can be useful in scenarios where the most recently accessed items are less likely to be accessed again soon. 🔹 SLRU (Segmented LRU) SLRU divides the cache into two segments: a probationary segment and a protected segment. New items are initially placed into the probationary segment. If an item in the probationary segment is accessed again, it is promoted to the protected segment. 🔹 LFU (Least Frequently Used) LFU algorithm evicts the items with the lowest access frequency. 🔹 FIFO (First In First Out) FIFO is one of the simplest caching strategies, where the cache behaves in a queue-like manner, evicting the oldest items first, regardless of their access patterns or frequency. 🔹 TTL (Time-to-Live) While not strictly an eviction algorithm, TTL is a strategy where each cache item is given a specific lifespan. 🔹 Two-Tiered Caching In Two-Tiered Caching strategy, we use an in-memory cache for the first layer and a distributed cache for the second layer. 🔹 RR (Random Replacement) Random Replacement algorithm randomly selects a cache item and evicts it to make space for new items. This method is also simple to implement and does not require tracking access patterns or frequencies.

Answer 14

To understand the process involved, we need to divide the “scan to pay” process into two sub-processes: 1. Merchant generates a QR code and displays it on the screen 2. Consumer scans the QR code and pays Here are the steps for generating the QR code: 1. When you want to pay for your shopping, the cashier tallies up all the goods and calculates the total amount due, for example, $123.45. The checkout has an order ID of SN129803. The cashier clicks the “checkout” button. 2. The cashier’s computer sends the order ID and the amount to PSP. 3. The PSP saves this information to the database and generates a QR code URL. 4. PSP’s Payment Gateway service reads the QR code URL. 5. The payment gateway returns the QR code URL to the merchant’s computer. 6. The merchant’s computer sends the QR code URL (or image) to the checkout counter. 7. The checkout counter displays the QR code. These 7 steps complete in less than a second. Now it’s the consumer’s turn to pay from their digital wallet by scanning the QR code: 1. The consumer opens their digital wallet app to scan the QR code. 2. After confirming the amount is correct, the client clicks the “pay” button. 3. The digital wallet App notifies the PSP that the consumer has paid the given QR code. 4. The PSP payment gateway marks this QR code as paid and returns a success message to the consumer’s digital wallet App. 5. The PSP payment gateway notifies the merchant that the consumer has paid the given QR code.

Answer 15

The Essence of Serverless Computing Serverless computing abstracts server management tasks from the development team’s workload. Instead, it relies on Functions-as-a-Service (FaaS) to handle event-triggered code execution. With this setup, cloud providers can allocate resources dynamically and only charge for the actual compute time used instead of reserved capacity. Serverless architectures can support a wide range of applications, from simple CRUD operations to complex, event-driven data processing workflows. It fosters a focus on code and functionality, streamlining the deployment of applications that can automatically adapt to fluctuating workloads. Key Practices To completely take advantage of serverless architectures, here are some best practices: Design for failure Ensuring your application can effectively handle failures is essential in a serverless setup. Strategies like retry mechanisms and circuit breakers can help maintain reliability and availability. Optimize for performance Serverless performance optimization has two goals: reduce cold start latency and maximize resource utilization. Lightweight functions, programming language selection, and aligning memory and computing resources with function requirements can all help to reduce startup times and costs. Security considerations A proactive approach to security is a must. To protect your serverless applications, implement the least privilege principle, secure your API gateways, and encrypt data. Cost management Despite being cost-effective, improper utilization can result in increased costs. Monitor usage patterns and adjust resource allocations to keep the expenses under control. Navigating Pitfalls While the above practices yield results, there are also common pitfalls to be mindful of: Ignoring cold start latency The user experience can be significantly impacted by cold starts. Reduce them by using warm-up techniques and optimizing your code. Overlooking security in a shared environment Avoid being taken in by the convenience of serverless computing and allowing complacency to creep in. Inadequate function permissions and neglecting data encryption are common oversights. Ensure that robust security measures are in place. Complexity in managing multiple services The granular nature of serverless can result in architectural complexity, particularly when integrating multiple services and functions. Adopting Infrastructure as Code (IaC) and serverless frameworks streamline management. Limited control and vendor lock-in Dependence on a single cloud provider can limit your control and flexibility. Serverless solutions should be evaluated for flexibility and portability to ensure they align with long-term architectural goals. When and Where Going Serverless Makes Sense Serverless excels with event-driven applications due to its reactive execution model. For microservices, it enables independent scaling and deployment. It also works well for projects with fluctuating traffic through automatic, efficient scaling. It's ideal for rapid development, allowing focus on coding over infrastructure management. And the pay-as-you-go model also can be well-suited for cost-sensitive projects. However, serverless architecture generally doesn’t fit well with long-running tasks due to execution time limits. Applications requiring low latency can suffer because of potential cold start delays. And cases needing precise environmental control may not be a great fit as it offers limited infrastructure customization. Assess your project's specific needs; performance, costs, scalability, and so on, to determine if serverless aligns with the project goals. Wrapping Up Serverless architectures have simplified server management. It has enabled developers to focus more on code and functionality rather than managing infrastructure. Despite its benefits, navigating serverless computing requires an understanding of its complexities and limitations. By adhering to best practices and being mindful of potential pitfalls, developers can leverage serverless technologies to build scalable, cost-efficient, and resilient applications.

Answer 16

They are extremely good at scaling their system whenever needed. Here are 8 must-know strategies to scale your system. 1 - Stateless Services Design stateless services because they don’t rely on server-specific data and are easier to scale. 2 - Horizontal Scaling Add more servers so that the workload can be shared. 3 - Load Balancing Use a load balancer to distribute incoming requests evenly across multiple servers. 4 - Auto Scaling Implement auto-scaling policies to adjust resources based on real-time traffic. 5 - Caching Use caching to reduce the load on the database and handle repetitive requests at scale. 6 - Database Replication Replicate data across multiple nodes to scale the read operations while improving redundancy. 7 - Database Sharding Distribute data across multiple instances to scale the writes as well as reads. 8 - Async Processing Move time-consuming and resource-intensive tasks to background workers using async processing to scale out new requests

Answer 17

gRPC is a powerful remote procedure call (RPC) framework developed by Google, enabling efficient and fast communication between services. It is built on HTTP/2 and Protocol Buffers. This is where a lot of the benefits of gRPC are derived from; Harnessing Protocol Buffers (Protbufs) as its interface definition language (IDL) helps alleviate tech stack lock-in. Each service can be written in any popular programming language, the IDL works across them all. The compact binary format of Protobufs provides faster serialization/deserialization, and a smaller payload than JSON. Less data sent means better performance. Since Protobufs are strongly typed, they provide type safety, which can eliminate many potential bugs. Utilizing HTTP/2 suppers bidirectional streaming and reduced latency for real-time data transmission. The combination of HTTP/2 and Protobufs provides maximum throughput and minimal latency. It's often faster than the traditional JSON over HTTP approach. The easy implementation and benefits above have made gRPC very popular for microservices communication.

Answer 18

Semantic versioning is a standardized way to communicate software upgrades. It categorizes changes into three buckets: 🔴 Major: Contains breaking changes that require users to upgrade their code or integration. 🟢 Minor: Changes are backward-compatible. Typically extends functionality or improves performance. 🟣 Patch: Contains bug fixes that don’t change existing functionality. Pro tip: A simplified framework for thinking about SemVer is “Breaking.Feature.Fix”. SemVer provides an easy and clear way to communicate changes in software, which helps manage dependencies, plan releases, and troubleshoot problems.

Additional System Design Flashcards

(59 cards)