Framework Flashcards

1
Q

What are the main sections of the delivery framework

A
  1. Requirements
  2. Core Entities
  3. API or Interface
  4. Data Flow
  5. High-level Design
  6. Deep Dives
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the time limit and goal of the requirements section

A

Time limit: 5 mins
Goal: Gain a clear understanding of the system by breaking requirements into function and non-functional requirements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are functional requirements

A
  • Core features of the system being designed
  • “Users/Clients should be able to…” statements.
  • Requirements should be targeted
  • Prioritize on top 3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are non-functional requirement

A
  • System qualities important to users
  • “The system should be able to…” or “The system should be..” statements
  • Should be in context of the system and quantified where possible, e.g. “The system should have a low latency search, <500ms” instead of “The system should be low latency”
  • Prioritize top 3-5
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are things to consider when creating non-functional requirements

A
  1. CAP Theorem: prioritize consistency or availability
  2. Environment Constraints: Web, mobile, etc.
  3. Scalability: Bursty traffic at certain times of days, read write ratio
  4. Latency: how quickly does the system need to respond to user requests
  5. Durability: how important is it to not lose data
  6. Security: Data protection, regulations
  7. Fault Tolerance: How does the system handle failures
  8. Compliance: Any legal or regulatory requirements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the time limit and goal of the core entities section

A

Time limit: 2 minutes
Goal: a bulleted list of the entities in the system

  • Who are the core actors in the system?
  • What are the nouns or resources necessary to satisfy the functional requirements
  • Use good names for entities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the time limit and goal of the API or system interface section

A

Time limit: 5 minutes
Goal: Define the contract between the system and it’s users

  • REST, GraphQL, or Wire Protocol (Generally use REST unless you are concerned with over-fetching
  • Generate a list of endpoints and what they would return
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the time limit and goal of the data flow section

A

Time limit: 5 minutes
Goal: Describe the high level sequence of actions or processes that the system performs on the inputs to produce the desired outputs

  • The data flow output will be a simple list
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the time limit and goal of the high level design section

A

Time limit: 10-15 minutes
Goal: A drawing of components and how they interact

  • Ensure the architecture satisfies the design
  • You may be able to go through your API one-by-one and build up your design
  • While drawing, talk through the process and how the data flow
  • Document relevant column/fields in the DB
  • Stay focused, this is only the high level design, complexity can be added later
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe the time limit and goal of the deep dive section

A

Time limit: 10 minutes
Goal: harden the design

  • Ensure the design meets all of the non-functional requirements
  • Address edge cases
  • Identify and address issues and bottlenecks
  • Improve the design based on questions from the interviewer
  • A senior candidate should identify the above cases and lead the discussion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a core database and what are choices for a core database

A
  • A core database is the data storage for your product
  • Choices are: Relational (SQL), NoSQL, Blob
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a relational database (RBDMS) and when should you use it

A
  • Relational databases store relations and are good at storing transactions
  • This is the default choices for a product design interview
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a NoSQL database and when should you use it

A
  • NoSQL databases are a broad category of databases that are often schma-less
  • Common data models are:
    • key-value
    • document
    • column-family
    • graph
  • Great candidates for
    • Flexible data models
    • Scalability
    • Handling big Data and real-time web apps
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a blob storage and when should you use it

A
  • A blob storage is used to store large unstructured blobs of data, e.g. video, images, etc.
  • You should avoid using a blob storage as your primary database
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a search optimized database and when should you use it

A
  • You should use a search optimized database when you need full-text search
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In the context of a search optimized database, what is an inverted index

A

An inverted index is a data structure that maps words to documents. This allows you to quickly find the documents that contain the words you are searching

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

In the context of a search optimized database, what is tokenization

A

Tokenization is the process of breaking a piece of text into individual words. This allows the mapping of words to an inverted index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

In the context of a search optimized database, what is stemming

A

Stemming is the process of reducing words to their root form. For exampling, “running” and “runs” would both be reduced to “run”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

In the context of a search optimized database, what is fuzzy search

A

Fuzzy search is the ability to find words similar to a given search term. This can be done with algorithms like edit distance to find words that might be mispelled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In the context of a search optimized database, what is scaling

A

Search optimized databases can be scaled horizontally by adding more nodes to a cluster and sharding across those nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are some examples of search optimized databases

A
  • Elastic Search
  • Postgres with a GIN index
  • Redis full text search
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

In the context of a blob storage, what is durability

A

Durability relates to the chance of data loss during a failure. Blob storages are quite durable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

In the context of a blob storage, what is scalability

A

Blob storages can be considered infinitely scalable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In the context of a blob storage, what is cost

A

Blob storages are cheap, generally an order of magnitude cheaper than NoSQL solutions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

In the context of a blob storage, what is security

A

Blob storages have built-in security features like: encryption at rest and in transit, and access control

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

In the context of a blob storage, what is uploading and downloading from the client

A

Blob storage services allow you to upload and download directly from the client. They generally utilize presigned URLs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

In the context of a blob storage, what is chunking

A

Because large files are uploaded/downloaded to blob storage, chunking allows uploading and downloading in parallel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

In the context of a relational database, what are joins

A

SQL joins are a way to join data from multiple different tables. Joining can be a performance bottleneck so it is advisable to minimize them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

In the context of a relational database, what are indexes

A

Indexes are a way of storing data to make it faster to query. Indexes are often implemented using a b-three or a hash table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

In the context of a relational database, what is a transaction

A

Transactions are a way of grouping multiple operations together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What are the ACID properties of a relational database

A

Atomicity: The entire transaction takes place at once or doesn’t happen at all
Consistency: The database must be consistent before and after the transaction
Isolation: Multiple transactions occur independently without interference
Durability: The changes of a successful transaction occurs even if the a system failure occurs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is an API gateway and when should you use it

A
  • An API gateway sits in front of your services and routes requests to the appropriate backend services, especially in microservice architectures
  • Gateways should be included in almost all product designs as the first point of contact for your clients
  • Gateways are typically responsible for things like authentication, rate limiting, and logging
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is a load balancer and when should you use it

A

A load balancer is useful in times of heavy traffic. It allows horizontal scaling by routing traffic to different machines to avoid overloading a single machine

34
Q

When should you choose a L4 load balancer or a L7 load balancer

A

Choose a L4 load balancer when you are doing real-time updates with websockets. Otherwise, choose a L7 load balancer

35
Q

What is a message queue and when should you use it

A
  • A queue’s function is to smooth load across a system.
  • A queue should be used to:
    • buffer for bursty traffic
    • distribute work across a system

Be careful not to introduce a queue into a synchronous work load as it will break latency requirements

36
Q

In the context of a message queue, what is message ordering

A

It is the way in which messages are ordered in the queue. The most popular is FIFO

37
Q

In the context of a message queue, what is a retry mechanism

A

A retry mechanism is a queues ability to redeliver a message a certain number of times before it’s considered a failure

38
Q

In the context of a message queue, what is a dead letter queue

A

A dead letter queue is a queue used to store messages that cannot be processed. They are useful for debugging and auditing

39
Q

In the context of a message queue, what is scaling with partitions

A

Queues can be partitioned across multiple machines, so increasing the number of machines in a partition can scale the queue

40
Q

In the context of a message queue, what is backpressure

A

Backpressure is a means of slowing down requests to make sure your system is not overwhelmed

41
Q

What are streams/event sourcing and when should you use them

A
  • Streams are continuous data flows that are stored and processed for a configurable period of time
  • Even sourcing is a technique where application state can be stored as a sequence of events allowing the application state to be reconstructed at any point of time

Common use cases are:
- You need to process large amounts of data in real time
- You need to support complex processing scenarios like event sourcing
- When you need to support multiple consumer reading from the same stream

42
Q

In the context of a stream, what is scaling with partitions

A

Partitions can be used to scale streams across multiple servers. Partition keys need to be specified to ensure related events are stored on the same partition

43
Q

In the context of a stream, what are multiple consumer groups

A

A stream can be read by multiple different consumers. One consumer might read a stream to populate a dashboard, while another consumer might populate a database for historical analysis

44
Q

In the context of a stream, what is replication

A

Streams can replicate data on multiple servers to ensure that the service is fault tolerant

45
Q

In the context of a stream, what is windowing

A

Windowing is a way to group events together based on time or count. This is great for aggregate analytics over a certain time, e.g. 15 mins, 1 hr, etc.

46
Q

What are some common streaming technologies

A
  • Kafka
  • Flink
  • Kinesis
  • Spark Streaming
47
Q

What is a distributed lock and when should you use them

A

A distributed lock is a way of locking something across multiple systems or processes for a reasonable amount of time. A distributed lock is generally implemented using a distributed key-value store

Common use cases are:
- E-commerce checkout system
- Ride-Sharing matchmaking
- Distributed Cron job
- Online auction bidding system

48
Q

In the context of a distributed lock, what are locking mechanisms

A

A locking mechanism is how the lock is implemented. This is typically done using a key-value store. One specific example is Redis using Redlock

49
Q

In the context of a distributed lock, what is lock expiry

A

A lock expiry is an expiration date on a lock. This is important to make sure a lock doesn’t get stuck in a lock state if process dies or hangs

50
Q

In the context of a distributed lock, what is locking granularity

A

A lock can be used to lock a single resource or a group of resources

51
Q

In the context of a distributed lock, what are deadlocks

A

This occurs when two processes are waiting on each other to release a lock.

One process has a lock A and need to lock B. A second process has a lock on B and need to lock A. Both are waiting for each other to release their current lock.

52
Q

What are some common distributed locking systems

A
  • Redis
  • Zookeeper
53
Q

What are common ways to prevent a deadlock

A
  1. Utilize resource ordering to avoid deadlocks
    - Ensure all processes acquire resources in a predefined global order
  2. Use timeouts
    - If a process cannot acquire a resource in a reasonable amount of time it aborts it’s operation
  3. Employ a try-lock mechanism
    - Use a non-block method that attempts to lock the resource and can try again later if the resource is currently locked
54
Q

What is a distributed cache and when should you use it

A

A distributed cache is a server or cluster of servers that store frequently used data to help lower latency

Common use cases are:
- Save aggregated metrics
- Reduce the number of DB queries
- Speed up expensive queries

55
Q

In the context of a distributed cache, what is an eviction policy

A

An eviction policy is a means of removing items form the cache

Example eviction policies are:
- LRU (Least Recently Used) Evicts the oldest item
- LFU (Least Frequently Used) Evicts the items accessed the least
- FIFO (First in First out) A queue based eviction

56
Q

In the context of a distributed cache, what is a cache invalidation strategy

A

A cache invalidation strategy is a means to ensure that data being stored in cache is accurate and up to date

57
Q

In the context of a distributed cache, what is a cache write strategy

A

A cache write strategy is a process in which data is written to the cache

Example strategies are:
- Write-Through Cache: write to cache and underlying data store simultaneously
- Write-Around Cache: write to the data store and not to cache
- Write-Back Cache: writes the data to the cache then asynchronously writes to the data store

58
Q

What are some common cache systems

A
  • Redis
  • Memcached
59
Q

What is a CDN and when should you use it

A

A CDN is a Content Delivery Network is a cache that uses distributed servers to deliver content based on a users geographic region

Common use cases:
- Static assets: images, videos, javascript files
- Dynamic content that is accessed frequently, but changes infrequently: e.g. a daily blog post
- Cache API responses to reduce latency
- Social media might store profile pictures in a CDN to serve to all users globally

60
Q

What are some common CDNs

A
  • Cloudflare
  • Akamai
  • CloudFront
61
Q

What is the CAP theorem

A

You can only have 2 out of the 3:
1. Consistency: all nodes/users see the same data at the same time
2. Availability: every request gets a response (successful or not)
3. Partition tolerance: system works despite network failures between nodes

62
Q

What is a strongly consistent system

A

Once data is written to a system, all subsequent reads will reflect the write

63
Q

What is a weakly consistent or eventually consistent system

A

Once data is written to a system, subsequent reads might read the old data. Eventually, the new data will be read

64
Q

What is Change Data Capture (CDC)

A

Change Data Capture is a process where changes (inserts, updates, deletes) are logged in a relational format. The results of CDC can be used for auditing or to update other systems, such as updating the index on Elastic Search

65
Q

What are the 4 main communication protocols

A
  1. HTTP(S)
  2. Server Side Events (SSE)
  3. Long Polling
  4. Websockets
66
Q

Describe the HTTP(S) communication protocol

A

HTTP(S) protocol is simply a REST or request/response interface. Each request is stateless, so the API can scale horizontally

67
Q

Describe the long polling communication protocol

A

Long polling is a blend of the HTTP(S) and websockets. The client will send a request and the server will hold on to the request until an update is available. Once the request is fulfilled, the client will submit another request.

68
Q

Describe the Server Side Events (SSE) communication protocol

A

The Server Side Events (SSE) protocol is best for unidirectional communication from the server to the client. The client can make one request and the server can send new data whenever available. This is achieved through a long-lived HTTP connection

69
Q

Describe the websockets communication protocol

A

Websockets are best if you need realtime, bidirectional communication between the client and server. Since the client needs to maintain an active connection with the server, this can be troublesome for load balancers. One way to implement websockets is to use a message broker between the client and server. This ensures you don’t need long lived connections to every service in your backend

70
Q

In the context of security what is authentication/authorization

A
  • Authentication: Is a user allowed on the system
  • Authorization: Is the user allowed to view a specific resource
  • API Gateways generally handle auth
  • Auth0 is also a good service to handle auth
71
Q

In the context of security what is encryption

A
  • Data in transit can be handled by protocol encryption (HTTPS SSL/TLS)
  • Data at rest can be handled by storage encryption
  • For sensitive data it may be best to sign the data with a key that only the user has so that no one else can view the data. This is known as End to End (E2E) encryption
72
Q

In the context of security what is data protection

A

Data protection is the process of ensuring data is protected from unauthorized access, use, or disclosure.

Using a rate limiter, or throttler is a good idea to hinder data being scraped

73
Q

What are the 3 levels of monitoring

A
  1. Infrastructure monitoring
  2. Service-level monitoring
  3. Application-level monitoring
74
Q

In the context of monitoring what is infrastructure monitoring

A

Infrastructure monitoring is monitoring the health and performance of your infrastructure: CPU usage, memory usage, disk usage, and network usage. Tools like Data Dog and New Relic are useful

75
Q

In the context of monitoring what is service-level monitoring

A

Service-level monitoring is the health and performance of your services: request latency, error rates, and throughput.

76
Q

In the context of monitoring what is application-level monitoring

A

Application-level monitoring is the health and performance of your application: the number of users, the number of active sessions, and the number of active connections. This could be key business metrics. Useful tools are Google Analytics and Mixpanel

77
Q

Describe the pattern Simple DB-backed CRUD service with caching

A
  • Most common for web based applications
  • Load balancer to distribute traffic across multiple instances of your service
78
Q

Describe the pattern async job worker pool

A
  • For systems that needs to process a lot of data and can tolerate a delay
  • Queue options: SQS, Kafka
  • Worker options: lambda, EC2 instances
79
Q

Describe the pattern two stage architecture

A
  • A two stage architecture is good for scaling an algorithm with poor performance
  • In the first stage, we use a fast algorithm to filter out the vast majority of dissimilar items
  • In the second stage, we use a slower algorithm that is more precise
  • The arch is common in:
    • Recommendation systems (candidate generators)
    • Search Engines (inverted indexes)
    • Route planning (ETA services)
80
Q

Describe the pattern event-driven architecture

A
  • Event-Driven Architecture (EDA) is useful in systems where it’s crucial to react to changes in real-time
  • Core components are: event producer, event routers (brokers), and event consumers
  • Event router options: Kafka, AWS Event Bridge
81
Q

Describe the pattern durable job processing

A
  • Durable job processing is a system that has jobs that might take hours or days to complete
  • The common pattern is to use a checkpointing system to periodically save a workers progress
  • Common distribute durable logs: Kafka, Uber’s Cadence, Temporal
82
Q

Describe the pattern proximity-based services

A
  • Proximity based services require you to search for entities by location
  • Geospatial indexes are key to querying and retrieving entities based on proximity
  • Common geospatial solutions: Postgres PostGIS, Redis Geospatial data type, Elasticsearch with geo-queries
  • The arch typically involves dividing the geographical area into manageable regions, thus reducing your search space
  • Geospatial indexes are only necessary when you need to index hundreds of thousands or millions of items. Otherwise, it’s better to just scan all of the items