SD Flashcards
Steps to System Design
- Clarify Requirements
- Capacity Estimation
- High Level Design
- Database Design
- API Design
- Dive into Key Components
- Address Key Issues: Scalability, Reliability
or
- Requirements clarifications: Always ask questions to find the exact scope of the problem you are solving.
- Back-of-the-envelope estimation: It’s always a good idea to estimate the scale of the system you are going to design. This will also help later, when you will be focusing on scaling, partitioning, load balancing, caching, etc.
- System interface definition: Define what APIs are expected from the system. This will not only establish the exact contract expected from the system but also ensure that you have not gotten any requirements wrong.
- Define data model: Defining the system data model early on will clarify how data will flow among different components of the system and later will also guide towards the data partitioning and management.
- High-level design: Draw a block diagram with 5–6 boxes representing the core components of your system. You should identify enough components that are needed to solve the actual problem from end to end.
- Detailed design: Dig deeper into 2–3 components; interviewer’s feedback should always guide you towards which parts of the system she wants you to explain further. You should be able to provide different options, their pros and cons, and why are you choosing them?
- Identifying and resolving bottlenecks: Try to discuss as many bottlenecks (and different approaches to mitigate them) as possible.
Clarify Requirements
Understand the problem, clarify any ambiguities and gather as much info as possible about the system
Two types of requirements to clarify:
* Functional
* Non-functional
Understanding the scope early prevents you from heading in the wrong direction.
Functional Requirements Questions
What are the core features that the system should support?
Are there any particular features that are more critical than others?
Who will use this system (customers, internal teams etc..)?
What specific actions should users be able to perform on the system?
How will users interact with the system (web, mobile app, API, etc.)?
Does the system need to support multiple languages or locales?
What are the key data types the system must handle (text, images, structured data, etc). It can influence your database choices.
Are there any external systems or third-party services the system needs to integrate with?
Non-Functional Requirements Questions
What is the expected scale of the system in terms of users and requests?
How much data volume is expected to be handled by the system?
What are the inputs and outputs of the system?
What is the expected read-to-write ratio?
Can the system have some downtime, or does it need to be highly available?
Are there any specific latency requirements?
How critical is data consistency? Can some eventual consistency be tolerated for the sake of availability?
Are there any specific non-functional requirements (performance, scalability, reliability) we should focus on?
Capacity Estimation
After clarifying the requirements, you can do some calculations to estimate the capacity of the system you are going to design.
Note: Not every system design interview will require detailed capacity estimates. It’s always a good idea to check with your interviewer if it’s necessary.
That said, it’s usually helpful to at least get a rough idea of the number of requests and storage requirements.
Estimating the scale upfront helps guide your design decisions and ensures that the system can meet the desired criteria.
This can include things like expected daily/monthly users, read/write requests per second, data storage and network bandwidth needs.
USERS: Estimate the number of daily users and maximum concurrent users during peak hours.
TRAFFIC: Calculate expected read/write per second. Consider peak traffic periods and potential spikes in usage.
STORAGE: Consider the different types of data (structured, unstructured, multimedia) and estimate the total amount of storage required (and its growth rate).
MEMORY: Evaluate the potential benefits of caching to reduce latency and improve performance. Estimate how much memory you might need to store frequently accessed data.
NETWORK: Estimate bandwidth requirements based on the estimated traffic volume and data transfer sizes.
It is always a good idea to estimate the scale of the system we’re going to design. This will also help later when we will be focusing on scaling, partitioning, load balancing, and caching.
- What scale is expected from the system (e.g., number of new tweets, number of tweet views, number of timeline generations per sec., etc.)?
- How much storage will we need? We will have different storage requirements if users can have photos and videos in their tweets.
- What network bandwidth usage are we expecting? This will be crucial in deciding how we will manage traffic and balance load between servers.
High Level Design
With the requirements and expected capacity in mind, start designing the high-level architecture of the system.
Break down the system into major components or modules, such as the frontend, backend, databases, caches, and external services.
Draw a simple BLOCK DIAGRAM with 5–6 boxes representing the core components of our system. We should identify enough components that are needed to solve the actual problem from end to end that outlines the major system components and the high-level flow of data and requests through the system, from the client to the backend and back.
- Keep it simple and clean.
- Use appropriate notations and symbols to represent the components, their interactions, and the data flow.
- Use different colors, line styles, or symbols to differentiate between various types of components or interactions.
- Stick with simple boxes representing components and arrows showing directional data flow.
- Show how data flows through the system, from input to storage and retrieval using arrows.
- Avoid cluttering the diagram with too much detail or unnecessary elements.
- Don’t overthink the minor details, this is about the big picture.
For Twitter, at a high level, we will need multiple application servers to serve all the read/write requests with load balancers in front of them for traffic distributions. If we’re assuming that we will have a lot more read traffic (as compared to write), we can decide to have separate servers for handling these scenarios. On the backend, we need an efficient database that can store all the tweets and can support a huge number of reads. We will also need a distributed file storage system for storing photos and videos.
clients -> LB -> [appServer1, appServer2, …] <—> DB, File Storage
What to include in High Level Design Diagram
CLIENT APPLICATIONS: Indicates how users will interact with the system (web browser, mobile app, desktop application etc.).
WEB SERVERS: Servers that handle incoming requests from clients.
LOAD BALANCERS: Used to evenly distribute traffic to servers to handle significant traffic.
APPLICATION SERVICES: The backend logic layer where the core functionalities of the system are implemented.
DATABASES: Specify the type of database: SQL vs. NoSQL, and briefly explain why.
CACHING LAYER: Specify caching (eg.. Redis, Memcached) if you’re using to reduce load on the database.
MESSAGE QUEUES: If using asynchronous communication.
EXTERNAL SERVICES: If the system relies on third-party APIs (e.g., payment gateways), include them.
For every component, make sure to consider trade-offs and justify why you picked specific technologies or architectures (e.g., “We need strong consistency, so a relational database is a good fit”).
Database Design
This steps involve modeling the data, choosing the right storage for the system, designing the database schema and optimizing the storage and retrieval of data based on the access patterns.
- Data Modeling
- Choosing the Right Storage
- Design The Database Schema
- Define Data Access Patterns
Data Modeling
- Identify the main data entities or objects that the system needs to store and manage (e.g., users, products, orders).
- Consider the relationships between these entities and how they interact with each other.
- Determine the attributes or properties associated with each entity (e.g., a user has an email, name, address).
- Identify any unique identifiers or primary keys for each entity.
- Consider normalization techniques to ensure data integrity and minimize redundancy.
Defining the data model in the early part of the interview will clarify how data will flow between different components of the system. Later, it will guide for data partitioning and management. The candidate should be able to identify various entities of the system, how they will interact with each other, and different aspects of data management like storage, transportation, encryption, etc. Here are some entities for our Twitter-like service:
- User: UserID, Name, Email, DoB, CreationData, LastLogin, etc.
- Tweet: TweetID, Content, TweetLocation, NumberOfLikes, TimeStamp, etc.
- UserFollows: UserdID1, UserID2
- FavoriteTweets: UserID, TweetID, TimeStamp
Choosing the Right Storage
- Evaluate the requirements and characteristics of the data to determine the most suitable database type.
- Consider factors such as data structure, scalability, performance, consistency, and query patterns.
- Relational databases (e.g., MySQL, PostgreSQL) are suitable for structured data with complex relationships and ACID properties.
- NoSQL databases (e.g., MongoDB, Cassandra) are suitable for unstructured or semi-structured data, high scalability, and eventual consistency. BASE properties (basically available, soft state, and eventually consistent)
- Consider using a combination of databases if different data subsets have distinct requirements.
Relational, Key-Value, Graph, Document, Column Store
Design the Database Schema
- Define the tables, columns, data types, and relationships based on the chosen database type.
- Specify primary keys, foreign keys, and any necessary indexes to optimize query performance.
- Consider denormalization techniques, such as duplication or pre-aggregation, to improve read performance if needed.
Table: User
- UserId: PK
- Name
- Email
- DOB
Table: Tweet
- TweetId: PK
- UserId
- Content
- Likes
- CreationTime
Table: UserFollow
- UserId1: FK
- UserId2: FK
What is Database Indexing?
Database indexing is a technique used to accelerate the retrieval of data within a database. Think of an index as the table of contents in a book 📖. Without it, the database would have to scan every row to find the needed data, which would be inefficient, especially as data volume grows.
When we create an index on a database column, we’re creating a structure that holds a sorted list of pointers to the rows where each unique value occurs. This makes retrieving rows by specific values (such as a specific user ID or product ID) significantly faster, especially as table size grows.
✨ Database indexing is a powerful technique for boosting query performance and scalability in large-scale systems. However, it requires a thoughtful and strategic approach to balance the costs and benefits.
🔍 By understanding the types of indexes available, their use cases, and the practical implications of index design, system architects and developers can build highly performant applications that scale gracefully under high traffic.
Why Database Indexing Is Essential
- Improves Query Performance: Indexes drastically reduce the time needed for data retrieval, especially on large datasets.
- Supports Query Optimization: Query optimizers in databases rely heavily on indexes to decide the most efficient path for retrieving data.
- Enhances System Scalability: In systems where performance needs to scale with user growth, indexing is crucial for maintaining high query throughput.
- Reduces Disk I/O: Since indexes allow databases to locate data with fewer reads, they decrease the amount of I/O operations, which is beneficial in both performance and cost.
Monitoring and Best Practices for Indexing in Production
Index Monitoring Tools:
- Database-Specific Tools: Most databases (e.g., MySQL’s EXPLAIN, PostgreSQL’s pg_stat_activity) provide tools for examining index usage and query plans.
- Performance Monitoring Tools: Tools like Prometheus, Datadog, and New Relic allow monitoring query performance and identifying slow queries affected by indexing.
- Automated Index Tuning: Cloud databases often have automatic index suggestions based on query patterns, helping optimize without manual intervention.
Types of Database Indexing Techniques
- B-Tree Indexes 🌲
* Ideal:- Range-based queries, ordered retrieval
* Example: - E-commerce product filtering by price
* Performance Impact: - Moderate Storage cost, good for range queries
- Range-based queries, ordered retrieval
- Hash Indexes 🔢
* Ideal:- Exact-match lookups, unique constraints
* Example: - Social media authentication
* Performance Impact: - Fast exact matches, no range support
- Exact-match lookups, unique constraints
- Bitmap Indexes
* Ideal:- Large text searching, natural language queries
* Example: - blog or article search
* Performance Impact: - resource-intensive, but powerful for text search
- Large text searching, natural language queries
- Full-Text Indexes
* Ideal:- Low-cardinality columns, analytic queries with AND/OR
* Example: - Data warehouse analytics (e.g. status filters)
* Performance Impact: - Efficient for AND/OR queries, less so for high-frequency updates
- Low-cardinality columns, analytic queries with AND/OR
What are the best practices for Database indexing
📊 Index Only When Necessary: Avoid over-indexing; create indexes based on query needs and frequency to maximize value.
🔄 Review Index Performance Regularly: Applications evolve, and so do query patterns. Routinely review and adjust your indexing strategies to align with current access patterns.
⚡ Use Covering Indexes for Common Queries: A covering index can fulfill a query directly, reducing I/O by avoiding main table access and improving response times.
What are B-Tree Indexes
B-Trees are a balanced tree data structure where nodes are arranged in sorted order, allowing for efficient range-based queries. B-Tree (Balanced Tree) indexes are among the most commonly used indexing structures in relational databases. A B-tree organizes data hierarchically, where each node can have multiple children. B-tree indexes are self-balancing, meaning that data is organized in a way that keeps the tree balanced for optimized read and write performance.
- Advantages:
- Excellent for range queries (e.g., finding all users aged between 25 and 30).
- Self-balancing properties provide consistent access times.
- Disadvantages:
- Performance can degrade with high-frequency updates due to rebalancing.
- More complex to maintain with heavy write loads.
- Use Cases: Suited for large datasets in read-heavy applications (e.g., e-commerce product listings). For eg. Suppose an e-commerce platform has a products table that includes columns like product_id, price, and date_added. Users might want to filter products within a specific price range or list products added within a certain timeframe. A B-Tree index on the price or date_added column can enable this efficiently:
What are Hash Indexes
Hash indexes use a hash function to convert a search key into a location in a table. These indexes work well for equality comparisons (e.g., finding a user by user ID) but are not effective for range queries.
- Advantages:
- Very fast for equality comparisons (e.g., SELECT * FROM users WHERE id = ?).
- Less memory overhead compared to B-trees for single-column indexes.
- Disadvantages:
- Cannot handle range queries.
- May have performance issues with collisions if hash values are not unique.
- Use Cases: High-speed lookups in applications where queries are based primarily on unique IDs or keys (e.g., session token lookup). For eg. Consider a social media platform where user authentication checks if the provided username and password_hash match a stored record. Since this query only requires an exact match and doesn’t involve any range-based searching, a hash index is ideal.
What are Bitmap Indexes
Bitmap indexes store columns as binary strings (bitmaps), where each bit indicates the presence or absence of a particular value. Bitmap indexes are very efficient for columns with low cardinality (i.e., columns with a limited number of distinct values, like a “status” field).
- Advantages:
- Efficient for columns with low cardinality (e.g., Boolean or status fields).
- Excellent for complex queries involving multiple fields.
- Disadvantages:
- Requires substantial storage space on high-cardinality fields.
- Can slow down write operations due to the need to update multiple bitmaps.
- Use Cases: Data warehouses and analytical databases where queries are read-intensive and based on low-cardinality fields. In a data warehouse storing millions of transactions for analysis, columns like status (with values like ‘completed,’ ‘pending,’ ‘failed’) or is_premium (yes/no) benefit from bitmap indexing. Analysts often need to filter and aggregate data based on these low-cardinality columns, and bitmap indexes allow for efficient query processing on them.
- Ideal Use Cases : Data warehouses, OLAP systems, Report generation
- Poor Use Cases : OLTP systems, High-cardinality columns, Frequent updates
What are Full-Text Indexes
Full-text indexes are specialized for searching text-based fields using keywords. They are widely used in applications where searching text data is essential, like document management systems.
- Advantages:
- Highly optimized for text search queries.
- Supports complex queries, including Boolean and proximity searches.
- Disadvantages:
- Can consume large amounts of storage and increase complexity.
- Slower to maintain on fields with frequent text updates.
- Use Cases: Search-heavy applications, such as social media and document search systems. For eg. Imagine a blog platform where users want to search articles based on keywords, titles, and body content. Full-text indexing on these columns can allow for efficient and flexible search functionality across large text fields.
Define Data Access Patterns
- Identify the common data access patterns and queries that the system will perform.
- Optimize the database schema and indexes based on these access patterns to ensure efficient data retrieval.
- Use appropriate caching mechanisms to store frequently accessed data and reduce database load.
- For scalability, consider partitioning or sharding your data across multiple databases or tables.
Design API and Communication Protocols
Designing the API (Application Programming Interface) and communication protocols defines how different components of the system interact with each other and how external clients can access the system’s functionality.
- Identify the API Requirements:
- Choose the API Style:
- Define the API Endpoints:
- Specify the Data Formats:
- Choose Communication Protocols:
Define what APIs are expected from the system. This will not only establish the exact contract expected from the system but will also ensure that we haven’t gotten any requirements wrong. Some examples of APIs for our Twitter-like service will be:
postTweet(user_id, tweet_data,
tweet_location, timestamp, …)
generateTimeline(user_id,
current_time, user_location, …)
markTweetFavorite(user_id, tweet_id, timestamp, …)
Identify the API Requirements
- Determine the main functionalities and services that the system needs to expose through the API.
- Consider the different types of clients (e.g., web, mobile, third-party services) that will interact with the API.
- Identify the data inputs, outputs, and any specific requirements for each API endpoint.
Choose the API Style:
- Select an appropriate API style based on the system’s requirements and the clients’ needs.
- RESTful APIs (Representational State Transfer) are commonly used for web-based systems and provide a uniform interface for resource manipulation.
- GraphQL APIs offer a flexible and efficient approach for clients to query and retrieve specific data fields.
- RPC (Remote Procedure Call) APIs are suitable for systems with well-defined procedures or functions.
Define the API Endpoints:
- Design clear and intuitive API endpoints based on the system’s functionalities and data model.
- Use appropriate HTTP methods (e.g., GET, POST, PUT, DELETE) for each endpoint to indicate the desired action.
- createProfile(name, email, password string)
- postTweet(userID, content string, timestamp time.Time)
- followUser(userID1, userID2 string)
Specify the Data Formats:
- Choose the data formats for the API requests and responses.
- Common formats include JSON (JavaScript Object Notation) and XML (eXtensible Markup Language).
- Consider factors such as readability, parsing efficiency, and compatibility with the clients and system components.
Choose Communication Protocols:
- HTTPS: Commonly used for RESTful APIs and web-based communication.
- WebSockets: Useful for real-time, bidirectional communication between clients and servers (e.g., chat applications).
- gRPC (gRPC Remote Procedure Call): Efficient for inter-service communication in microservices architectures.
- Messaging Protocols: AMQP, MQTT for asynchronous messaging (often used with message queues).
Dive Deeper into Key Components
Your interviewer will likely want to focus on specific areas so pay attention and discuss those things in more detail.
Common Areas for Deep Dives:
- Databases: How would you handle a massive increase in data volume? Discuss sharding (splitting data across multiple databases), replication (read/write replicas).
- Web Servers/Application Servers: How do you add more servers behind the load balancer for increased traffic?
- Load Balancers: Which Load Balancing techniques and algorithms to use (e.g., round-robin, least connections).
- Caching: Where would you add more cache layers (in front of web servers? in the application layer?), and how would you deal with cache invalidation?
- Single Points of Failure: Identify components whose failure would take down the system and discuss how to address it.
- Authentication/Authorization: How would you manage user access and permissions securely?
- Rate Limiting: How would you prevent excessive use or abuse of your APIs?
the only important thing is to consider tradeoffs between different options while keeping system constraints in mind.
- Since we will be storing a massive amount of data, how should we partition our data to distribute it to multiple databases? Should we try to store all the data of a user on the same database? What issue could it cause?
- How will we handle hot users who tweet a lot or follow lots of people?
- Since users’ timelines will contain the most recent (and relevant) tweets, should we try to store our data in such a way that is optimized for scanning the latest tweets?
- How much and at which layer should we introduce cache to speed things up?
- What components need better load balancing
clients
—> LB
—-> [appServer1, 2, …]
—-> [AggSvc1, 2, …]
—-> [DB Shard1, 2, …]
<—> [CacheSvc1, 2, …]
—-> LB
—-> [CacheSvc1, 2, …]
—-> file storage
—-> [CacheSvc1, 2, …]
Address Key Issues
This step involves identifying and addressing the core challenges that your system design is likely to encounter.
These challenges can range from scalability and performance to reliability, security, and cost concerns.
- Addressing Scalability and Performance Concerns
- Addressing Reliability
Try to discuss as many bottlenecks as possible and different approaches to mitigate them.
- Is there any single point of failure in our system? What are we doing to mitigate it?
- Do we have enough replicas of the data so that if we lose a few servers, we can still serve our users?
- Similarly, do we have enough copies of different services running such that a few failures will not cause a total system shutdown?
- How are we monitoring the performance of our service? Do we get alerts whenever critical components fail, or their performance degrades?