C1 Flashcards

Question

Why is a message queue like Kafka used in the banking app?

Answer 1

To handle asynchronous messaging for long-running tasks (e.g., notifications) and ensure reliable delivery with retries. * Kafka is a distributed queue * high throughput low latency * scalability (horizontal scaling, Topics are partitioned across brokers, and consumers can read from these partitions concurrently, which further boosts scalability) * Durability and Reliability * Kafka stores messages on disk and can replicate data across multiple brokers, ensuring data persistence and fault tolerance. * If a broker fails, Kafka can seamlessly recover from replicas, ensuring no data loss and continued operation. * Data Retention and Replayability * Fault Tolerance * Decoupling of Producers and Consumers * Real-time and Batch Processing Capabilities * Exactly-Once Processing Semantics

Answer 2

Circuit breakers detect service failures and prevent further calls, enabling fallback responses and avoiding cascading failures. * Transaction Processing: For each dependency within a transaction, wrap calls in a circuit breaker. If, for example, the Transaction Service needs to verify account balances and the balance service is unavailable, the circuit breaker will prevent further requests to it, avoiding a backlog. * Cross-Service Dependencies: If the Transaction Service needs to interact with other services (e.g., fraud detection, payment authorization), circuit breakers ensure that failures in these services don’t disrupt the entire transaction processing workflow. * Real-Time Alerts and Monitoring: Circuit breakers can be integrated with monitoring tools to alert administrators when certain services are unreachable, allowing proactive issue resolution. Example * Check Account Balance: If the balance service is down, the circuit breaker opens, and the Transaction Service can either return an error or use a cached balance (if available). * Fraud Check: If fraud detection is unavailable, the circuit breaker will prevent retries, and the Transaction Service might flag the transaction for later review rather than delaying or failing the operation. * Payment Processor: If the payment processor is down, the Transaction Service quickly returns an error response, allowing the user to retry later instead of experiencing extended delays. Circuit breakers prevent cascading failures, improve response times by failing fast, enable graceful degradation, and support automatic recovery

Answer 3

A dead-letter queue stores messages that could not be processed, allowing for isolation of problematic messages and improving system resilience.

Answer 4

By adding more instances of each microservice, each capable of handling a portion of the load, ensuring better performance under high traffic.

Answer 5

ELK Stack (Elasticsearch, Logstash, Kibana) or Grafana and Prometheus for real-time monitoring, logging, and alerting.

Answer 6

Distributed tracing tracks requests across services, helping to identify latency and bottlenecks in complex microservices interactions.

Answer 7

Connection pooling reduces the overhead of opening and closing connections frequently, improving database throughput and performance.

Answer 8

By using a master-replica setup where the primary (master) handles writes, and replicas handle reads, ensuring redundancy and availability.

Answer 9

Use Redis for caching balance data, implement read-after-write consistency, and consider caching at the API Gateway level for frequently accessed data. * Near-Real-Time Cache Invalidation for Critical Data * Some banking operations, like balance updates and funds transfers, are highly sensitive to consistency. In these cases, implement a near-real-time cache invalidation mechanism to clear or update cache entries immediately after a transaction. * For instance, if a user makes a transfer, immediately invalidate or update the balance cache to ensure accurate information for the next balance check. * This approach provides reliability for sensitive information and improves trust in the app’s data accuracy. * Read-through-cache * A read-through cache automatically loads data into the cache from the database or backend service when it's requested but not already cached. * In a banking app, this strategy is useful for frequently accessed but relatively static data like account information (e.g., account details, interest rates, and account types). * This approach reduces latency for users by caching data the first time it's requested, improving response times on subsequent requests. * Write-Through Cache for Frequently Updated Data * In a write-through cache, data is written to both the cache and the database simultaneously, ensuring the cache always has the latest data. * This is beneficial for high-read, high-write items like user preferences and recent transactions. * By keeping frequently updated data in the cache, users experience faster load times, while the cache remains consistent with the database. * Client-Side Caching for Static Data * For data that doesn’t change frequently (e.g., terms and conditions, product details), leverage client-side caching to reduce server requests. * This allows the app to store static data on the user's device, improving performance and reducing data usage, which can be important for users on mobile networks. * Implement versioning to refresh client-side cache when these resources are updated, ensuring users always have access to the latest information.

Answer 10

Sharding distributes data across multiple database instances, reducing load on each instance and enabling horizontal scaling. * Enhanced Performance and Reduced Latency * Each shard contains only a subset of the data, meaning queries are handled by a smaller dataset, which reduces the time taken to retrieve data. * Improved Fault Isolation and Reliability * Sharding isolates data into separate databases. If one shard encounters an issue (e.g., hardware failure or data corruption), the impact is limited to only that shard, not the entire database. * Efficient Resource Management * By segmenting data, each shard requires fewer system resources (memory, CPU, and storage) than a monolithic database would. This reduces the load on each database instance and avoids the resource limitations associated with a single, large database. Examples * Hash-Based Sharding (Random Distribution) * Distribute customer accounts or transactions by hashing the account ID or transaction ID, ensuring an even distribution of data across shards. * This approach helps balance the load across shards but requires a central mechanism for mapping requests to the correct shard. * Range-Based Sharding (Data Segmentation) * Partition data by ranges (e.g., customer account numbers or date ranges for transactions), useful for workloads where data locality is beneficial. * For instance, transaction records can be sharded by date range to keep recent data in a hot shard for quick access, while older records are stored in archival shards.

Answer 11

To offload read-heavy operations from the primary database, improving performance and allowing the primary to focus on write operations.

Answer 12

Use atomic transactions in the database, employ two-phase commits if needed, and lock resources to prevent concurrent modifications.

Answer 13

It ensures atomicity across distributed services, preventing partial updates in multi-service transactions (though it can add complexity).

Answer 14

Use HTTPS for data transmission, JWT or OAuth for authentication, role-based access control (RBAC), and data encryption at rest. To secure a banking API, a combination of robust authentication (OAuth, MFA), encryption (TLS, AES), rate limiting, input validation, real-time monitoring, and secure integration with third-party services is necessary. Additionally, ensuring compliance with security standards and implementing fraud detection measures is critical for maintaining a secure and trustworthy API. By following these best practices, you can minimize the risk of data breaches, fraud, and other security threats in your banking system. * Authentication * OAuth 2.0: Implement OAuth 2.0 for secure, token-based authentication * MFA: Require MFA for users and administrators to reduce the risk of unauthorized access * API Keys: For service-to-service communication, use API keys that are securely stored and associated with specific API clients to control access to resources. * Authorization * Role-Based Access Control (RBAC): Implement RBAC to ensure users and applications have only the minimum permissions required. * Scope and Permissions: Use fine-grained access control, limiting API access based on specific user roles and API scopes. * Data Encryption * TLS (Transport Layer Security): Enforce HTTPS for all communications between clients and the API to encrypt data in transit. * Encryption at Rest: Sensitive data such as account information, transaction history, and user credentials should be encrypted when stored in databases or file systems * Rate Limiting and Throttling * Rate Limiting: Implement rate limiting to prevent abuse and DoS (Denial of Service) attacks. * Throttling: Throttling can be used to slow down traffic when rate limits are exceeded. * Input Validation and Sanitization * Sanitize Inputs: Validate and sanitize all user inputs to protect against injection attacks, including SQL injection, script injection (XSS), and other malicious payloads. * Output Encoding: Ensure that any data rendered back to the user (e.g., via API responses) is properly encoded to prevent injection attacks like cross-site scripting (XSS) * Logging and Monitoring * Audit Logs: Maintain detailed logs of all API calls, including successful and failed login attempts, account changes, transactions, and other critical operations. * Real-Time Monitoring: Implement real-time monitoring to detect unusual activities such as multiple failed login attempts, large transactions, or spikes in API requests that could indicate abuse or fraud. * Centralized Logging System: Use centralized logging systems like ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, or CloudWatch for easier tracking and analysis of security events. * API Gateway and Web Application Firewall (WAF) * API Gateway: Place an API gateway in front of your API to handle security tasks such as routing, rate limiting, authentication, logging, and security policy enforcement * WAF (Web Application Firewall): Deploy a WAF to filter and monitor HTTP traffic to the API, blocking malicious requests such as SQL injection, cross-site scripting (XSS), and other attacks based on predefined rules. * IP Whitelisting and Geo-Blocking * IP Whitelisting: For sensitive API operations (e.g., admin access), restrict access to trusted IP addresses. * Geo-Blocking: In certain cases, you can block API access from countries or regions that are not relevant to your business, or where there are higher risks of fraud. * Session Management * Token Expiry: Use short-lived tokens for API authentication, and refresh tokens periodically to limit the risk of a token being compromised. Ensure that tokens have a reasonable expiry time (e.g., 15 minutes to 1 hour). * Session Timeouts: Implement session timeouts for user sessions, especially for operations that require higher levels of security (e.g., transferring funds or modifying account settings). * Logout Mechanism: Provide users with the ability to log out or revoke active sessions to protect their accounts. * Security Best Practices for External Integrations * Third-Party API Security: If the banking API integrates with external services (e.g., payment processors, fraud detection), ensure that these services use secure authentication methods (such as OAuth) and follow stringent security practices. * Limit Data Sharing: Ensure that only the necessary data is shared with third-party services, and use data anonymization or tokenization where possible to reduce the risk of exposing sensitive customer information. * Compliance with Security Standards * PCI DSS Compliance: Ensure that your API complies with PCI DSS (Payment Card Industry Data Security Standard) if dealing with payment information or card details. * GDPR Compliance: For applications that handle European Union residents’ data, comply with GDPR by implementing appropriate data protection measures, like consent management and data anonymization. * SOC 2 or ISO 27001: If applicable, ensure your system is certified for security standards like SOC 2 or ISO 27001, which indicate that your system follows recognized security controls.

Answer 15

Use centralized logging, distributed tracing, real-time metrics, and alerting to quickly detect and troubleshoot issues.

Answer 16

Partitioning by time (e.g., monthly) keeps data manageable, improves query performance for recent transactions, and speeds up large table operations.

Answer 17

The interface includes: - `POST createAccount`: Creates an account with necessary user info, ensuring no duplicate accounts. - `PUT manageAccount`: Updates user account info, validating changes. - `POST transfer`: Initiates a fund transfer, handling retries to avoid double transactions. - `GET viewAccountBalances`: Retrieves account balance with up-to-date data. - `GET transactionHistory`: Fetches paginated transaction history using cursor-based pagination. - Key considerations include idempotency for transfers, caching for balance retrieval, and strict authentication for sensitive data access.

Answer 18

Use read-after-write consistency with caching strategies like: - **Cache Invalidation**: Invalidate or update cache on balance change, ensuring latest values. - **Direct Database Reads**: For critical operations, read from the primary database instead of cache. - These techniques help avoid stale data when displaying balances after transactions.

Answer 19

1) **Account Management Service**: Manages account creation, updates, and deletion. 2) **Transaction Service**: Manages fund transfers, balance checks, and idempotency for transactions. 3) **Notification Service**: Sends notifications asynchronously for events like successful transactions.

Answer 20

2PC is a protocol ensuring distributed transactions are atomic: - **Phase 1 (Prepare)**: Services prepare for a transaction and confirm readiness. - **Phase 2 (Commit/Rollback)**: Services commit or rollback based on responses. - Useful for cross-service consistency, though it can add complexity and potential latency.

Answer 21

The Transaction Service handles: - Fund transfers and balance verification. - Ensures atomic transactions for consistency. - Handles duplicate transactions by checking unique transaction IDs, ensuring no double-withdrawals.

Answer 22

The Saga Pattern manages distributed transactions by coordinating smaller, independent transactions: - Each service performs part of the workflow; on failure, compensating actions roll back partial changes. - This pattern suits banking apps for operations like fund transfers where full rollback is complex.

Answer 23

Partition by **Account ID** or **time (e.g., monthly)** for large tables: - Account-based partitioning allows isolation of user-specific data. - Time-based partitioning improves performance for recent transaction queries and maintenance.

Answer 24

- **JWT** or **OAuth** tokens for secure, stateless user authentication. - **Role-based access control (RBAC)** for different user permissions. - **Encrypted communications** (HTTPS) to protect data in transit. Multi-Factor Authentication (MFA) Password Policies Least Privilege Principle Fine-grained Authorization Time-based Access Control Audit Logs Data Encryption * User Login: * The user enters their username and password. * If the credentials are valid, MFA is triggered (e.g., OTP sent to the user’s phone). * The system checks if the user’s device is trusted or if there are any anomalies (e.g., a login attempt from a new device). * If successful, the system generates a session token (with an expiration time). * Performing a Transaction: * The user requests to transfer money. * The system checks if the user has the necessary balance and if the requested transaction falls within their allowed limits. * The system validates the transaction through a second authentication step (e.g., biometric verification). * The system checks authorization (e.g., does the user have permission to make such a transfer?). * The transaction is logged and processed. * Admin Access: * An admin logs into the system with additional checks (e.g., stronger authentication, restricted to specific IPs). * The system verifies the admin’s role and grants access to sensitive data or settings based on the least privilege principle.

Answer 25

CQRS separates write operations (commands) from read operations (queries): - Useful in systems with distinct read/write requirements or complex business logic. - In a banking app, CQRS could optimize balance queries by separating them from account updates.

Answer 26

Event Sourcing stores state changes as a series of events: - Useful for retaining a full history of changes, enabling better auditing. - In banking, it could be applied to transaction history, preserving every balance update as an event.

Answer 27

- **Primary-replica setup**: Primary handles writes, replicas handle read-heavy operations. - **Auto-failover**: Automatic failover to a secondary replica in case of primary failure. - **Partitioning and sharding**: Distributes data across multiple nodes for improved resilience.

Answer 28

Message queues provide: - **Asynchronous processing** for non-critical tasks like notifications. - **Retries** for failed messages, ensuring reliable event delivery. - **Decoupling** between services, allowing independent service scaling.

Answer 29

A dead-letter queue stores messages that could not be processed: - Allows for isolation of problematic messages. - Helps identify and troubleshoot issues with transactions, ensuring system resilience.

Answer 30

Partitioning reduces the search space for queries: - **Account ID** partitioning allows efficient access to user-specific data. - **Time-based partitioning** makes recent transactions faster to query, especially beneficial for retrievals and maintenance.

Answer 31

Cursor-based pagination uses a "cursor" pointing to the last item retrieved: - Instead of requesting a page number, clients request items after a specific cursor ID. - Reduces database load compared to traditional pagination, as it avoids total record counts.

Answer 32

The API Gateway should handle: - **Authentication** and **authorization** for secure access. - **Rate limiting** to prevent misuse or DDoS attacks. - **Request routing** to the appropriate microservices.

Answer 33

Read-after-write consistency ensures that data is immediately available after an update: - Important for balances so users see accurate information post-transaction. - Can be achieved with cache invalidation or reading directly from the primary database.

Answer 34

- Use a **message queue** to receive notifications from other services. - **Consumers** in the Notification Service read from the queue and send alerts (e.g., emails, SMS). - Ensures reliability, even if the Notification Service experiences temporary delays.

Answer 35

- **Time-based expiration**: Set TTL for cache entries to refresh data periodically. - **Event-based invalidation**: Clear or update cache on specific events, such as a transaction completion. - **Versioning**: Use cache versions to ensure outdated data is not accessed.

Answer 36

Redis can store frequently accessed data, like account balances, reducing load on the main database: - **Low-latency reads** improve response time for common queries. - **Cache invalidation** strategies maintain data consistency.

Answer 37

- **Primary Key** on `accountId` for unique identification. - **Unique Index** on `email` to prevent duplicate accounts. - **Index** on `lastLogin` for querying recent activity.

Answer 38

- **Primary Key** on `transactionId`. - **Composite Indexes** on `(fromAccountId, timestamp)` and `(toAccountId, timestamp)` for efficient history queries. - **Index on timestamp** for time-based pagination.

Answer 39

- **Scalability**: Each service scales independently. - **Fault isolation**: Failures are contained within each service, improving resilience. - **Ease of maintenance**: Clear separation of concerns aids code management and updates.

Answer 40

Distributed tracing tracks request flows across services: - Helps identify latency, bottlenecks, and failure points. - Useful for troubleshooting in complex service interactions.

Answer 41

CQRS separates command (write) and query (read) responsibilities: - Optimizes for scalability by allowing independent tuning of read and write pathways. - In banking, CQRS allows high-frequency balance reads without impacting account updates.

Answer 42

Load balancers distribute requests across multiple instances: - Prevents overload on any single instance. - Supports health checks to route requests away from failed instances.

Answer 43

Eventual consistency allows more flexible scalability at the cost of slight delays: - Suitable for non-critical data, like certain analytics, where real-time updates aren’t essential. - Not typically used for balances or sensitive operations.

Answer 44

- **Encryption**: Use HTTPS for data in transit, and encrypt sensitive data at rest. - **Authentication**: Implement OAuth or JWT for secure logins. - **Access Control**: Role-based access (RBAC) ensures users access only allowed resources.

Answer 45

- **ELK Stack (Elasticsearch, Logstash, Kibana)**: Useful for centralized logging and error tracking. - **Prometheus and Grafana**: For real-time metrics and visualization. - **Jaeger**: Distributed tracing tool to track request flows across services.

Answer 46

Sharding splits data across multiple nodes, enabling: - Improved performance by distributing read/write operations. - Scalability by adding more shards as data grows. - Banking apps might shard by account range or region.

Answer 47

Circuit breakers detect failing services and prevent further requests: - Reduces load on troubled services and prevents cascading failures. - Enables graceful degradation, improving user experience.

Answer 48

Around 2.72 GB (20 million accounts * 136 bytes per account).

Answer 49

20 million accounts (10 million users * 2 accounts per user).

Answer 50

20 million transactions per day (10 million users * 2 transactions per user per day).

Answer 51

Approximately 2,315 TPS (2 million transactions during peak / 7,200 seconds).

Answer 52

Approximately 897 GB (820 MB per day * 365 days * 3 years).

Answer 53

Components include accountId (8 bytes), userId (8 bytes), balance (8 bytes), accountType (4 bytes), creationTime (8 bytes), and additional metadata (~100 bytes).

Answer 54

Components include transactionId (8 bytes), fromAccountId and toAccountId (16 bytes), amount (8 bytes), timestamp (8 bytes), and isSuccessful status (1 byte).

Answer 55

16 MB (2 million accounts * 8 bytes per balance).

Answer 56

Approximately 4.92 GB (2 million accounts * 60 transactions * 41 bytes per transaction).

Answer 57

832.2 MB (2,315 TPS * 100 bytes * 3,600 seconds).

Answer 58

Around 900 GB (2.72 GB for accounts + 897 GB for transactions over 3 years).

Answer 59

Around 5.768 GB (4.936 GB for caching + 832.2 MB for the message queue).

Answer 60

Partitioning can help manage and query large volumes of transaction data, possibly by year or account ID range, to improve performance and scalability.

Answer 61

Replication is necessary for each component (e.g., Redis, Kafka) to achieve high availability, potentially doubling storage and memory requirements based on redundancy needs.

Answer 62

User Authentication and Management: Cognito handles user sign-up, sign-in, forgot password, and MFA (multi-factor authentication) out of the box. It also allows for federated authentication, meaning you can integrate external identity providers like Google, Facebook, or corporate SSO (via SAML). Token Issuance: After successful login, Cognito issues JWT tokens (ID token, access token, and refresh token) for the user. These tokens are used for stateless authentication and authorization across the application, ensuring that no additional authentication logic is needed in the backend. Authorization: Cognito integrates with IAM (Identity and Access Management) to define roles and permissions. You can set up role-based access control (RBAC) where each user or group has specific permissions. You can include these roles and permissions in the JWT token, and your API Gateway or backend services can check the token’s claims to validate authorization. Scaling and Security: Cognito is a fully managed service, which means it scales automatically without you needing to worry about the underlying infrastructure. It comes with built-in security features like encryption, DDoS protection, and compliance with standards like HIPAA, GDPR, and SOC 2.

Answer 63

User Authentication (Cognito): 1. The user logs in via Cognito (using a username/password, social login, or SSO). Upon successful login, Cognito issues JWT tokens for the user. API Gateway and Token Validation: 2. API Gateway receives requests with the JWT token in the Authorization header. The API Gateway validates the token by checking the signature and verifying it with Cognito. Backend Services: 3. Backend services (like Account Service, Transaction Service) extract the user identity and roles directly from the JWT token and process the request based on that. Session Management: 4. If you need session persistence or additional caching (e.g., user profiles), you can use something like Redis to cache certain user information but the core authentication is handled by Cognito.

C1 Flashcards

(87 cards)