System Design Flashcards
Delivery Framework
- Requirements
- Core Entities
- API or System Interface
- Data Flow
- High Level Design
- Deep Dive
Requirements
- Functional Requirements - “Users/clients should be able to…” Top 3
- Non-functional Requirements - “System should be / should be able to…” Top 3
- Capacity Estimations
Nonfunctional Requirements Checklist (8)
- CAP theorem, but for distributed systems really just CA, P is a given
- Environment constraints, ie battery life or limited memory
- Scalability, unique reqs such as bursts traffic or read/write ratio
- Latency, specifically for anything with meaningful computation
- Durability, how important that data is not lost
- Security, ie data protection, access control
- Fault tolerance, ie redundancy, failover, recovery mechanisms
- Compliance, ie legal or regulatory requirements or standards
Bytes to store data
ASCII - 1 byte
Unicode - 2 bytes
Split seconds
Millisecond (ms) 1/1000
Microsecond (us) 1/1,000,000
Nanoseconds (ns) 1/1,000,000,000
Read latency
Memory
1mb/.25ms, 4gb/s
SSD (4x memory)
1mb/ms, 1gb/s
Disk (20x SSD)
1mb/20ms
Worldwide trip
6/s
Request Calculations by second
2.5 mil seconds per year
1 million per month = .4/s
2.5 million per month = 1/s
10 million per month = 4/s
100 million per month = 40/s
1 billion per month = 400/s
Storage estimates:
2 hr movie
Small plain text book
High res photo
Med res image
Movie 1gb
Book 1mb
Photo 1mb
Med res image 100kb
DB Writes vs Reads
Write is 40x more expensive than read
Core Entities
2 min.
What the API will exchange and will persist in data model. Ex user/tweet,follow for twitter.
Bullet list
API or System Interface
RESTful or GraphQL
Endpoints with path and parameters
Data Flow
Actions or processes that the system performs on the input to produce the desired outputs
Core Concepts
Scaling - work distribution and data distribution
Consistency
Locking
Indexing
Communication Protocols
Security - authentication and authorization, encryption, data protection
Monitoring - infrastructure, system level, application level
Key Technologies
Core DB
Blob storage
Search optimized DB
API gateway
Load balancer
Queue
Streams / event sourcing
Distributed lock
Distributed cache
CDN
Patterns
DB backed CRUD with caching
Async job worker pool
2 stage architecture
Event driven architecture
Durable job processing
Proximity based services
Core API - high level overview
“Our Core API uses a layered .NET architecture, deployed in EKS. Controllers
handle HTTP routing, Services handle business logic, and a Data layer interacts
with Aurora and Redis. This lets us scale the service horizontally while keeping
the codebase maintainable.”
Core API - layered architecture justification
“We wanted to separate concerns—controllers focus on HTTP requests, services
encapsulate domain rules, and our data layer deals with Aurora and caching. This
approach cuts down on coupling and makes it easier to adapt or extract
microservices down the road.”
Core processor - explanation
“We have a central ETL pipeline—the Core Processor—which ingests data from
multiple providers, stores raw payloads in S3, and then transforms/loads it into Aurora.
Tasks run on a cron based scheduler and it retries on failure with exponential backoff, ensuring resilience even if a provider is temporarily down”
Core API - why K8s?
“Kubernetes gave us automated scaling and rolling updates out of the box. We
can spin up more pods during major sporting events and scale back when traffic is
low, all while ensuring near-zero downtime.”
Core API - EKS rolling updates
“We use a rolling update strategy so that when deploying a new version of the
API, only one old pod goes down at a time—our system stays online, and if
something fails, we can roll back quickly.”
Core API - stateless pods
“Even though our application manages a lot of data, we designed each pod to be stateless. Any persistent data—sessions, user info, or stats—resides in Aurora, Redis, or S3.
That means losing a pod doesn’t risk losing data.”
Core API - Ingress and Helm templating
“We have an internal ALB that terminates TLS and checks liveness via /health.
The ALB is configured via Ingress annotations in our Helm chart, ensuring only
healthy pods receive requests. We define everything in Helm charts, from replicas
and resource limits to Ingress rules. Environment-specific overrides like values-
stage.yaml and values-prod.yaml let us run the same code in staging vs.
production with minimal overhead.”
Core API - CI/CD pipeline
“We use CircleCI to build Docker images, run tests, push the image to ECR, then
automatically update our Helm chart. If linting or validation fails, the deployment
never proceeds—meaning we catch issues before they hit production.”
Core API - automatic rollbacks
“Our pipeline can roll back a Helm release if we detect a spike in 500 errors or
failing health checks. That safety net lets us move fast and confidently ship
updates.”
Core API - environment specific builds
“For each commit on the ‘master’ branch, CircleCI sets DOTNETCORE_ENVIRONMENT=production and deploys to our production
cluster. For ‘stable’, it uses stage—we keep these pipelines consistent, ensuring
minimal drift.”
Core API - Redis caching
“We cache frequently requested data in Redis—like top odds or event stats—for short TTLs.
This offloads read traffic from Aurora and drastically reduces latency on hot endpoints.”
Core API - in memory caching
“Each pod has an in-memory cache for micro-optimizations, but it’s not critical if a pod restarts—it’s purely ephemeral. That’s a classic stateless approach, as all permanent state lives in external data stores.”
Core API - Metrics
“We used Prometheus and Grafana for real-time visibility into common and custom metrics. That data helps us spot anomalies or performance regressions fast, and Grafana let us set events to trigger slack notifications for the proper team”
Core API - Rollbar
“Any exception in the Core API automatically logs to Rollbar and critical errors trigger slack notifications to the proper teams. During a major sporting event, if we see a surge of 500 errors, we can quickly pinpoint which endpoint or DB call is failing.”
Core API - latency tracking
“We keep a histogram of HTTP request durations. By tracking P95 and P99 latencies, we ensure that even our worst-case requests stay within acceptable bounds, especially during heavy game traffic.”
Core API - CAP theorem
“We operate in a distributed AWS environment, so partition tolerance is mandatory.
We typically favor high availability over strict consistency by reading from Aurora replicas—though the primary itself is strongly consistent.
That means brief eventual consistency for read workloads, which is acceptable for this domain.”
Core API - consistency
“We do strongly consistent writes to Aurora’s primary. But for reads—especially from replicas or caches—we accept short-lived eventual consistency.
The lag is usually small, and it’s worth it to maintain high throughput under load.”
Core API - security
“All local developers must assume an MFA-secured AWS role. Secrets are stored in Parameter Store or K8s secrets, meaning we never expose plain-text creds in code or logs.”
Core API - internal ALB
“We use an internal ALB for traffic, so it’s not publicly accessible. On top of that, Kubernetes role based access control restricts who can modify deployments or read secrets, ensuring a tight security posture.”
Core API - estimating capacity
“We measure requests-per-second during major sporting events and compare it to CPU/memory usage. If we see pods hitting 80% CPU or if DB queries approach saturation, we scale out.
Aurora read replicas handle the read spikes, and Redis further reduces direct DB hits.”
Core API - main bottleneck
“Ultimately, Aurora can become the bottleneck for heavy writes. We mitigate that with indexing, short caches, and read replicas. If needed, we could further partition data, but so far Aurora’s performance has met our needs.
Nevertheless, I recently built an archiving task that runs nightly to archive all market lines from over 18 months in the past, which included a few hundred million records from a terabytes size table”
PSO - centralized data for all properties
Problem: Multiple newly acquired properties each ingested sports data differently, creating inconsistencies.
Solution: We built a Core API on .NET, containerized on EKS, and standardized data ingestion via the Core Processor.
Outcome: We reduced duplication, established a single source of truth, and scaled seamlessly for peak sports seasons.
PSO - zero downtime deployments
Problem: Rolling updates were risky with older infrastructure, often causing partial outages.
Solution: By using Helm with rolling updates and readiness probes, we can gradually shift
traffic to new pods while old pods are drained.
Outcome: Near-zero downtime deploys and the ability to roll back quickly if metrics or logs show a spike in errors.
PSO - real-time observability
Problem: We lacked insight into production performance; debugging took hours.
Solution: We integrated Telegraf for metrics and Rollbar for error logs.
Outcome: The moment error rates spike, we get Slack alerts and can see exactly which
endpoints or queries are failing, cutting response times in half.
Single Responsibility Principle
Classes should have a single responsibility, and only one reason to change. Everything it does should be very closely related so class isn’t bloated.
Open-Closed Principle
Code should be open to extension, but closed to modification. Instead of modifying we can make a subclass that inherits from the base, extension methods
Liskov Substitution Principle
A child class should be able to do everything a parent class can.
Interface Segregation Principle
Client should never be forced to implement an interface it doesn’t use or forced to depend on methods they don’t use
Dependency Inversion
High level modules shouldn’t depend on low level modules. Both should depend on abstraction.
Pattern: CRUD service
Most common and simple. Backed up by db and cache. Fronted by api and lb.
Client -> API -> Load Balancer -> Service -> Cache -> Database
Pattern: async job worker pool
For systems that have lots of processing and can tolerate some delay. Good for processing images and videos.
Queue -> Workers -> Database
Pattern: 2 stage architecture
Good for recommendations, search, route planning. Fast but inaccurate stage then slow but precise to finish off.
Ranking service (slow but precise) -> Vector DB (fast but inaccurate) <- Blob storage
Pattern: Event drive architecture
Centered around events, good for systems that need to react to changes in real-time, ex E-Commerce when an order is placed
Event producers -> event routers/brokers (Kafka or eventbridge) -> event consumers to process the events and take necessary actions
Pattern: Durable job processing
For long running jobs. Store jobs in something like Kafka, then pool of workers process the jobs. They periodically checkpoint progress to a durable log and if a worker crashes another can pick up where it left off.
Phase 1 workers -> distributed, durable log -> Phase 2 workers -> durable log -> Phase 3 workers
Pattern: Proximity based services
Ex Uber
Divide geographical area into regions and index entities within the regions. Allows system to exclude vast areas that don’t contain relevant entities