Framework Flashcards
What are the main sections of the delivery framework
- Requirements
- Core Entities
- API or Interface
- Data Flow
- High-level Design
- Deep Dives
Describe the time limit and goal of the requirements section
Time limit: 5 mins
Goal: Gain a clear understanding of the system by breaking requirements into function and non-functional requirements
What are functional requirements
- Core features of the system being designed
- “Users/Clients should be able to…” statements.
- Requirements should be targeted
- Prioritize on top 3
What are non-functional requirement
- System qualities important to users
- “The system should be able to…” or “The system should be..” statements
- Should be in context of the system and quantified where possible, e.g. “The system should have a low latency search, <500ms” instead of “The system should be low latency”
- Prioritize top 3-5
What are things to consider when creating non-functional requirements
- CAP Theorem: prioritize consistency or availability
- Environment Constraints: Web, mobile, etc.
- Scalability: Bursty traffic at certain times of days, read write ratio
- Latency: how quickly does the system need to respond to user requests
- Durability: how important is it to not lose data
- Security: Data protection, regulations
- Fault Tolerance: How does the system handle failures
- Compliance: Any legal or regulatory requirements
Describe the time limit and goal of the core entities section
Time limit: 2 minutes
Goal: a bulleted list of the entities in the system
- Who are the core actors in the system?
- What are the nouns or resources necessary to satisfy the functional requirements
- Use good names for entities
Describe the time limit and goal of the API or system interface section
Time limit: 5 minutes
Goal: Define the contract between the system and it’s users
- REST, GraphQL, or Wire Protocol (Generally use REST unless you are concerned with over-fetching
- Generate a list of endpoints and what they would return
Describe the time limit and goal of the data flow section
Time limit: 5 minutes
Goal: Describe the high level sequence of actions or processes that the system performs on the inputs to produce the desired outputs
- The data flow output will be a simple list
Describe the time limit and goal of the high level design section
Time limit: 10-15 minutes
Goal: A drawing of components and how they interact
- Ensure the architecture satisfies the design
- You may be able to go through your API one-by-one and build up your design
- While drawing, talk through the process and how the data flow
- Document relevant column/fields in the DB
- Stay focused, this is only the high level design, complexity can be added later
Describe the time limit and goal of the deep dive section
Time limit: 10 minutes
Goal: harden the design
- Ensure the design meets all of the non-functional requirements
- Address edge cases
- Identify and address issues and bottlenecks
- Improve the design based on questions from the interviewer
- A senior candidate should identify the above cases and lead the discussion
What is a core database and what are choices for a core database
- A core database is the data storage for your product
- Choices are: Relational (SQL), NoSQL, Blob
What is a relational database (RBDMS) and when should you use it
- Relational databases store relations and are good at storing transactions
- This is the default choices for a product design interview
What is a NoSQL database and when should you use it
- NoSQL databases are a broad category of databases that are often schma-less
- Common data models are:
- key-value
- document
- column-family
- graph
- Great candidates for
- Flexible data models
- Scalability
- Handling big Data and real-time web apps
What is a blob storage and when should you use it
- A blob storage is used to store large unstructured blobs of data, e.g. video, images, etc.
- You should avoid using a blob storage as your primary database
What is a search optimized database and when should you use it
- You should use a search optimized database when you need full-text search
In the context of a search optimized database, what is an inverted index
An inverted index is a data structure that maps words to documents. This allows you to quickly find the documents that contain the words you are searching
In the context of a search optimized database, what is tokenization
Tokenization is the process of breaking a piece of text into individual words. This allows the mapping of words to an inverted index
In the context of a search optimized database, what is stemming
Stemming is the process of reducing words to their root form. For exampling, “running” and “runs” would both be reduced to “run”
In the context of a search optimized database, what is fuzzy search
Fuzzy search is the ability to find words similar to a given search term. This can be done with algorithms like edit distance to find words that might be mispelled
In the context of a search optimized database, what is scaling
Search optimized databases can be scaled horizontally by adding more nodes to a cluster and sharding across those nodes
What are some examples of search optimized databases
- Elastic Search
- Postgres with a GIN index
- Redis full text search
In the context of a blob storage, what is durability
Durability relates to the chance of data loss during a failure. Blob storages are quite durable
In the context of a blob storage, what is scalability
Blob storages can be considered infinitely scalable
In the context of a blob storage, what is cost
Blob storages are cheap, generally an order of magnitude cheaper than NoSQL solutions
In the context of a blob storage, what is security
Blob storages have built-in security features like: encryption at rest and in transit, and access control
In the context of a blob storage, what is uploading and downloading from the client
Blob storage services allow you to upload and download directly from the client. They generally utilize presigned URLs
In the context of a blob storage, what is chunking
Because large files are uploaded/downloaded to blob storage, chunking allows uploading and downloading in parallel
In the context of a relational database, what are joins
SQL joins are a way to join data from multiple different tables. Joining can be a performance bottleneck so it is advisable to minimize them
In the context of a relational database, what are indexes
Indexes are a way of storing data to make it faster to query. Indexes are often implemented using a b-three or a hash table
In the context of a relational database, what is a transaction
Transactions are a way of grouping multiple operations together
What are the ACID properties of a relational database
Atomicity: The entire transaction takes place at once or doesn’t happen at all
Consistency: The database must be consistent before and after the transaction
Isolation: Multiple transactions occur independently without interference
Durability: The changes of a successful transaction occurs even if the a system failure occurs
What is an API gateway and when should you use it
- An API gateway sits in front of your services and routes requests to the appropriate backend services, especially in microservice architectures
- Gateways should be included in almost all product designs as the first point of contact for your clients
- Gateways are typically responsible for things like authentication, rate limiting, and logging
What is a load balancer and when should you use it
A load balancer is useful in times of heavy traffic. It allows horizontal scaling by routing traffic to different machines to avoid overloading a single machine
When should you choose a L4 load balancer or a L7 load balancer
Choose a L4 load balancer when you are doing real-time updates with websockets. Otherwise, choose a L7 load balancer
What is a message queue and when should you use it
- A queue’s function is to smooth load across a system.
- A queue should be used to:
- buffer for bursty traffic
- distribute work across a system
Be careful not to introduce a queue into a synchronous work load as it will break latency requirements
In the context of a message queue, what is message ordering
It is the way in which messages are ordered in the queue. The most popular is FIFO
In the context of a message queue, what is a retry mechanism
A retry mechanism is a queues ability to redeliver a message a certain number of times before it’s considered a failure
In the context of a message queue, what is a dead letter queue
A dead letter queue is a queue used to store messages that cannot be processed. They are useful for debugging and auditing
In the context of a message queue, what is scaling with partitions
Queues can be partitioned across multiple machines, so increasing the number of machines in a partition can scale the queue
In the context of a message queue, what is backpressure
Backpressure is a means of slowing down requests to make sure your system is not overwhelmed
What are streams/event sourcing and when should you use them
- Streams are continuous data flows that are stored and processed for a configurable period of time
- Even sourcing is a technique where application state can be stored as a sequence of events allowing the application state to be reconstructed at any point of time
Common use cases are:
- You need to process large amounts of data in real time
- You need to support complex processing scenarios like event sourcing
- When you need to support multiple consumer reading from the same stream
In the context of a stream, what is scaling with partitions
Partitions can be used to scale streams across multiple servers. Partition keys need to be specified to ensure related events are stored on the same partition
In the context of a stream, what are multiple consumer groups
A stream can be read by multiple different consumers. One consumer might read a stream to populate a dashboard, while another consumer might populate a database for historical analysis
In the context of a stream, what is replication
Streams can replicate data on multiple servers to ensure that the service is fault tolerant
In the context of a stream, what is windowing
Windowing is a way to group events together based on time or count. This is great for aggregate analytics over a certain time, e.g. 15 mins, 1 hr, etc.
What are some common streaming technologies
- Kafka
- Flink
- Kinesis
- Spark Streaming
What is a distributed lock and when should you use them
A distributed lock is a way of locking something across multiple systems or processes for a reasonable amount of time. A distributed lock is generally implemented using a distributed key-value store
Common use cases are:
- E-commerce checkout system
- Ride-Sharing matchmaking
- Distributed Cron job
- Online auction bidding system
In the context of a distributed lock, what are locking mechanisms
A locking mechanism is how the lock is implemented. This is typically done using a key-value store. One specific example is Redis using Redlock
In the context of a distributed lock, what is lock expiry
A lock expiry is an expiration date on a lock. This is important to make sure a lock doesn’t get stuck in a lock state if process dies or hangs
In the context of a distributed lock, what is locking granularity
A lock can be used to lock a single resource or a group of resources
In the context of a distributed lock, what are deadlocks
This occurs when two processes are waiting on each other to release a lock.
One process has a lock A and need to lock B. A second process has a lock on B and need to lock A. Both are waiting for each other to release their current lock.
What are some common distributed locking systems
- Redis
- Zookeeper
What are common ways to prevent a deadlock
- Utilize resource ordering to avoid deadlocks
- Ensure all processes acquire resources in a predefined global order - Use timeouts
- If a process cannot acquire a resource in a reasonable amount of time it aborts it’s operation - Employ a try-lock mechanism
- Use a non-block method that attempts to lock the resource and can try again later if the resource is currently locked
What is a distributed cache and when should you use it
A distributed cache is a server or cluster of servers that store frequently used data to help lower latency
Common use cases are:
- Save aggregated metrics
- Reduce the number of DB queries
- Speed up expensive queries
In the context of a distributed cache, what is an eviction policy
An eviction policy is a means of removing items form the cache
Example eviction policies are:
- LRU (Least Recently Used) Evicts the oldest item
- LFU (Least Frequently Used) Evicts the items accessed the least
- FIFO (First in First out) A queue based eviction
In the context of a distributed cache, what is a cache invalidation strategy
A cache invalidation strategy is a means to ensure that data being stored in cache is accurate and up to date
In the context of a distributed cache, what is a cache write strategy
A cache write strategy is a process in which data is written to the cache
Example strategies are:
- Write-Through Cache: write to cache and underlying data store simultaneously
- Write-Around Cache: write to the data store and not to cache
- Write-Back Cache: writes the data to the cache then asynchronously writes to the data store
What are some common cache systems
- Redis
- Memcached
What is a CDN and when should you use it
A CDN is a Content Delivery Network is a cache that uses distributed servers to deliver content based on a users geographic region
Common use cases:
- Static assets: images, videos, javascript files
- Dynamic content that is accessed frequently, but changes infrequently: e.g. a daily blog post
- Cache API responses to reduce latency
- Social media might store profile pictures in a CDN to serve to all users globally
What are some common CDNs
- Cloudflare
- Akamai
- CloudFront
What is the CAP theorem
You can only have 2 out of the 3:
1. Consistency: all nodes/users see the same data at the same time
2. Availability: every request gets a response (successful or not)
3. Partition tolerance: system works despite network failures between nodes
What is a strongly consistent system
Once data is written to a system, all subsequent reads will reflect the write
What is a weakly consistent or eventually consistent system
Once data is written to a system, subsequent reads might read the old data. Eventually, the new data will be read
What is Change Data Capture (CDC)
Change Data Capture is a process where changes (inserts, updates, deletes) are logged in a relational format. The results of CDC can be used for auditing or to update other systems, such as updating the index on Elastic Search
What are the 4 main communication protocols
- HTTP(S)
- Server Side Events (SSE)
- Long Polling
- Websockets
Describe the HTTP(S) communication protocol
HTTP(S) protocol is simply a REST or request/response interface. Each request is stateless, so the API can scale horizontally
Describe the long polling communication protocol
Long polling is a blend of the HTTP(S) and websockets. The client will send a request and the server will hold on to the request until an update is available. Once the request is fulfilled, the client will submit another request.
Describe the Server Side Events (SSE) communication protocol
The Server Side Events (SSE) protocol is best for unidirectional communication from the server to the client. The client can make one request and the server can send new data whenever available. This is achieved through a long-lived HTTP connection
Describe the websockets communication protocol
Websockets are best if you need realtime, bidirectional communication between the client and server. Since the client needs to maintain an active connection with the server, this can be troublesome for load balancers. One way to implement websockets is to use a message broker between the client and server. This ensures you don’t need long lived connections to every service in your backend
In the context of security what is authentication/authorization
- Authentication: Is a user allowed on the system
- Authorization: Is the user allowed to view a specific resource
- API Gateways generally handle auth
- Auth0 is also a good service to handle auth
In the context of security what is encryption
- Data in transit can be handled by protocol encryption (HTTPS SSL/TLS)
- Data at rest can be handled by storage encryption
- For sensitive data it may be best to sign the data with a key that only the user has so that no one else can view the data. This is known as End to End (E2E) encryption
In the context of security what is data protection
Data protection is the process of ensuring data is protected from unauthorized access, use, or disclosure.
Using a rate limiter, or throttler is a good idea to hinder data being scraped
What are the 3 levels of monitoring
- Infrastructure monitoring
- Service-level monitoring
- Application-level monitoring
In the context of monitoring what is infrastructure monitoring
Infrastructure monitoring is monitoring the health and performance of your infrastructure: CPU usage, memory usage, disk usage, and network usage. Tools like Data Dog and New Relic are useful
In the context of monitoring what is service-level monitoring
Service-level monitoring is the health and performance of your services: request latency, error rates, and throughput.
In the context of monitoring what is application-level monitoring
Application-level monitoring is the health and performance of your application: the number of users, the number of active sessions, and the number of active connections. This could be key business metrics. Useful tools are Google Analytics and Mixpanel
Describe the pattern Simple DB-backed CRUD service with caching
- Most common for web based applications
- Load balancer to distribute traffic across multiple instances of your service
Describe the pattern async job worker pool
- For systems that needs to process a lot of data and can tolerate a delay
- Queue options: SQS, Kafka
- Worker options: lambda, EC2 instances
Describe the pattern two stage architecture
- A two stage architecture is good for scaling an algorithm with poor performance
- In the first stage, we use a fast algorithm to filter out the vast majority of dissimilar items
- In the second stage, we use a slower algorithm that is more precise
- The arch is common in:
- Recommendation systems (candidate generators)
- Search Engines (inverted indexes)
- Route planning (ETA services)
Describe the pattern event-driven architecture
- Event-Driven Architecture (EDA) is useful in systems where it’s crucial to react to changes in real-time
- Core components are: event producer, event routers (brokers), and event consumers
- Event router options: Kafka, AWS Event Bridge
Describe the pattern durable job processing
- Durable job processing is a system that has jobs that might take hours or days to complete
- The common pattern is to use a checkpointing system to periodically save a workers progress
- Common distribute durable logs: Kafka, Uber’s Cadence, Temporal
Describe the pattern proximity-based services
- Proximity based services require you to search for entities by location
- Geospatial indexes are key to querying and retrieving entities based on proximity
- Common geospatial solutions: Postgres PostGIS, Redis Geospatial data type, Elasticsearch with geo-queries
- The arch typically involves dividing the geographical area into manageable regions, thus reducing your search space
- Geospatial indexes are only necessary when you need to index hundreds of thousands or millions of items. Otherwise, it’s better to just scan all of the items