Framework Flashcards

Question

In the context of a **blob storage**, what is **security**

Answer 1

Blob storages have built-in security features like: encryption at rest and in transit, and access control

Answer 2

Blob storage services allow you to upload and download directly from the client. They generally utilize presigned URLs

Answer 3

Because large files are uploaded/downloaded to blob storage, chunking allows uploading and downloading in parallel

Answer 4

SQL joins are a way to join data from multiple different tables. Joining can be a performance bottleneck so it is advisable to minimize them

Answer 5

Indexes are a way of storing data to make it faster to query. Indexes are often implemented using a b-three or a hash table

Answer 6

Transactions are a way of grouping multiple operations together

Answer 7

Atomicity: The entire transaction takes place at once or doesn't happen at all Consistency: The database must be consistent before and after the transaction Isolation: Multiple transactions occur independently without interference Durability: The changes of a successful transaction occurs even if the a system failure occurs

Answer 8

- An API gateway sits in front of your services and routes requests to the appropriate backend services, especially in microservice architectures - Gateways should be included in almost all product designs as the first point of contact for your clients - Gateways are typically responsible for things like authentication, rate limiting, and logging

Answer 9

A load balancer is useful in times of heavy traffic. It allows horizontal scaling by routing traffic to different machines to avoid overloading a single machine

Answer 10

Choose a L4 load balancer when you are doing real-time updates with websockets. Otherwise, choose a L7 load balancer

Answer 11

- A queue's function is to smooth load across a system. - A queue should be used to: - buffer for bursty traffic - distribute work across a system Be careful not to introduce a queue into a synchronous work load as it will break latency requirements

Answer 12

It is the way in which messages are ordered in the queue. The most popular is FIFO

Answer 13

A retry mechanism is a queues ability to redeliver a message a certain number of times before it's considered a failure

Answer 14

A dead letter queue is a queue used to store messages that cannot be processed. They are useful for debugging and auditing

Answer 15

Queues can be partitioned across multiple machines, so increasing the number of machines in a partition can scale the queue

Answer 16

Backpressure is a means of slowing down requests to make sure your system is not overwhelmed

Answer 17

- Streams are continuous data flows that are stored and processed for a configurable period of time - Even sourcing is a technique where application state can be stored as a sequence of events allowing the application state to be reconstructed at any point of time Common use cases are: - You need to process large amounts of data in real time - You need to support complex processing scenarios like event sourcing - When you need to support multiple consumer reading from the same stream

Answer 18

Partitions can be used to scale streams across multiple servers. Partition keys need to be specified to ensure related events are stored on the same partition

Answer 19

A stream can be read by multiple different consumers. One consumer might read a stream to populate a dashboard, while another consumer might populate a database for historical analysis

Answer 20

Streams can replicate data on multiple servers to ensure that the service is fault tolerant

Answer 21

Windowing is a way to group events together based on time or count. This is great for aggregate analytics over a certain time, e.g. 15 mins, 1 hr, etc.

Answer 22

- Kafka - Flink - Kinesis - Spark Streaming

Answer 23

A distributed lock is a way of locking something across multiple systems or processes for a reasonable amount of time. A distributed lock is generally implemented using a distributed key-value store Common use cases are: - E-commerce checkout system - Ride-Sharing matchmaking - Distributed Cron job - Online auction bidding system

Answer 24

A locking mechanism is how the lock is implemented. This is typically done using a key-value store. One specific example is Redis using Redlock

Answer 25

A lock expiry is an expiration date on a lock. This is important to make sure a lock doesn't get stuck in a lock state if process dies or hangs

Answer 26

A lock can be used to lock a single resource or a group of resources

Answer 27

This occurs when two processes are waiting on each other to release a lock. One process has a lock A and need to lock B. A second process has a lock on B and need to lock A. Both are waiting for each other to release their current lock.

Answer 28

- Redis - Zookeeper

Answer 29

1. Utilize resource ordering to avoid deadlocks - Ensure all processes acquire resources in a predefined global order 2. Use timeouts - If a process cannot acquire a resource in a reasonable amount of time it aborts it's operation 3. Employ a try-lock mechanism - Use a non-block method that attempts to lock the resource and can try again later if the resource is currently locked

Answer 30

A distributed cache is a server or cluster of servers that store frequently used data to help lower latency Common use cases are: - Save aggregated metrics - Reduce the number of DB queries - Speed up expensive queries

Answer 31

An eviction policy is a means of removing items form the cache Example eviction policies are: - LRU (Least Recently Used) Evicts the oldest item - LFU (Least Frequently Used) Evicts the items accessed the least - FIFO (First in First out) A queue based eviction

Answer 32

A cache invalidation strategy is a means to ensure that data being stored in cache is accurate and up to date

Answer 33

A cache write strategy is a process in which data is written to the cache Example strategies are: - Write-Through Cache: write to cache and underlying data store simultaneously - Write-Around Cache: write to the data store and not to cache - Write-Back Cache: writes the data to the cache then asynchronously writes to the data store

Answer 34

- Redis - Memcached

Answer 35

A CDN is a Content Delivery Network is a cache that uses distributed servers to deliver content based on a users geographic region Common use cases: - Static assets: images, videos, javascript files - Dynamic content that is accessed frequently, but changes infrequently: e.g. a daily blog post - Cache API responses to reduce latency - Social media might store profile pictures in a CDN to serve to all users globally

Answer 36

- Cloudflare - Akamai - CloudFront

Answer 37

You can only have 2 out of the 3: 1. Consistency: all nodes/users see the same data at the same time 2. Availability: every request gets a response (successful or not) 3. Partition tolerance: system works despite network failures between nodes

Answer 38

Once data is written to a system, all subsequent reads will reflect the write

Answer 39

Once data is written to a system, subsequent reads might read the old data. Eventually, the new data will be read

Answer 40

Change Data Capture is a process where changes (inserts, updates, deletes) are logged in a relational format. The results of CDC can be used for auditing or to update other systems, such as updating the index on Elastic Search

Answer 41

1. HTTP(S) 2. Server Side Events (SSE) 3. Long Polling 4. Websockets

Answer 42

HTTP(S) protocol is simply a REST or request/response interface. Each request is stateless, so the API can scale horizontally

Answer 43

Long polling is a blend of the HTTP(S) and websockets. The client will send a request and the server will hold on to the request until an update is available. Once the request is fulfilled, the client will submit another request.

Answer 44

The Server Side Events (SSE) protocol is best for unidirectional communication from the server to the client. The client can make one request and the server can send new data whenever available. This is achieved through a long-lived HTTP connection

Answer 45

Websockets are best if you need realtime, bidirectional communication between the client and server. Since the client needs to maintain an active connection with the server, this can be troublesome for load balancers. One way to implement websockets is to use a message broker between the client and server. This ensures you don't need long lived connections to every service in your backend

Answer 46

- Authentication: Is a user allowed on the system - Authorization: Is the user allowed to view a specific resource - API Gateways generally handle auth - Auth0 is also a good service to handle auth

Answer 47

- Data in transit can be handled by protocol encryption (HTTPS SSL/TLS) - Data at rest can be handled by storage encryption - For sensitive data it may be best to sign the data with a key that only the user has so that no one else can view the data. This is known as End to End (E2E) encryption

Answer 48

Data protection is the process of ensuring data is protected from unauthorized access, use, or disclosure. Using a rate limiter, or throttler is a good idea to hinder data being scraped

Answer 49

1. Infrastructure monitoring 2. Service-level monitoring 3. Application-level monitoring

Answer 50

Infrastructure monitoring is monitoring the health and performance of your infrastructure: CPU usage, memory usage, disk usage, and network usage. Tools like Data Dog and New Relic are useful

Answer 51

Service-level monitoring is the health and performance of your services: request latency, error rates, and throughput.

Answer 52

Application-level monitoring is the health and performance of your application: the number of users, the number of active sessions, and the number of active connections. This could be key business metrics. Useful tools are Google Analytics and Mixpanel

Answer 53

- Most common for web based applications - Load balancer to distribute traffic across multiple instances of your service

Answer 54

- For systems that needs to process a lot of data and can tolerate a delay - Queue options: SQS, Kafka - Worker options: lambda, EC2 instances

Answer 55

- A two stage architecture is good for scaling an algorithm with poor performance - In the first stage, we use a fast algorithm to filter out the vast majority of dissimilar items - In the second stage, we use a slower algorithm that is more precise - The arch is common in: - Recommendation systems (candidate generators) - Search Engines (inverted indexes) - Route planning (ETA services)

Answer 56

- Event-Driven Architecture (EDA) is useful in systems where it's crucial to react to changes in real-time - Core components are: event producer, event routers (brokers), and event consumers - Event router options: Kafka, AWS Event Bridge

Answer 57

- Durable job processing is a system that has jobs that might take hours or days to complete - The common pattern is to use a checkpointing system to periodically save a workers progress - Common distribute durable logs: Kafka, Uber's Cadence, Temporal

Answer 58

- Proximity based services require you to search for entities by location - Geospatial indexes are key to querying and retrieving entities based on proximity - Common geospatial solutions: Postgres PostGIS, Redis Geospatial data type, Elasticsearch with geo-queries - The arch typically involves dividing the geographical area into manageable regions, thus reducing your search space - Geospatial indexes are only necessary when you need to index hundreds of thousands or millions of items. Otherwise, it's better to just scan all of the items

Framework Flashcards

(82 cards)