Scaling, Caches, LBs Flashcards
Key characteristics of distributed systems
scalability
reliability
availability
efficiency
manageability
Scalability
the capability of a system, process or network to grow and manage increased demand
Horizontal Scaling
Scale up by adding more servers to your pool of resources
Vertical Scaling
Scale up by adding more power (CPU, RAM, Storage) to an existing server
vertical scaling is usually limited to the capacity of a single server
Reliability
Reliability refers to a system’s ability to perform it’s intend functions correctly and consistently
Reliability can be achieved through redundancy
Redundancy
the intentional duplication of critical components or functions of a system with the goal of increasing reliability and eliminating singles points of failure
Database Replication
Database replication is the process of copying and synchronizing data from one database to one or more additional databases.
This is common in distributed systems
ensures data availability, fault tolerance, and scalability.
Synchronous Replication
with synchronous replication, changes made to the primary database are immediately replicated to the replica databases before the write operation is considered complete
this ensures strong consistency between the database and its replicas
Asynchronous Replication
a type of database replication where changes made to the primary database are not immediately replicated to the replicas
Instead, changes are queued and replicated to the replicas at a later time.
PROS - Faster Performance, and writes won’t fail bc of issues with the replica
CONS - This delay can result in temporary inconsistencies between the primary and replica databases
Availability
A measure of the percentage of the time that a system, service or machine remains operational under normal conditions
Reliability vs. Availability
A system is RELIABLE if it’s able to function correctly
A system is AVAILABLE depending on the % of the time that it is operational under normal conditions
Good reliability translates to good availability
But availability does not indicate reliability
You can have high availability on an unreliable product by minimizing repair times and ensuring that replicas are always ready
Efficiency
Efficiency of a system is measured through latency and throughput
Latency
The time it takes for a certain operation to complete in a system
Throughput
The number of operations that a system can handle properly per time unit. Commonly measured in RPS (requests per second)
Serviceability
AKA manageability
describes how easy it is to maintain and repair a distributed system
key considerations:
- ease of diagnosing and understanding problems
- ease of making updates / modifications
- how simple the system is to operate
Load Balancer
device that sits between the client and the server to accept incoming network traffic / distribute it across multiple backend servers using various algorithms
reduces individual server load
prevents single-point-of-failure
improves availability
HEALTH CHECKS:
important to note that load balancers should keep track of the status of available resources while distributing requests. If a server is not available or has an elevated error rate, LB will not send traffic there
Where can we place load balancers?
Between the user and the web servers
Between web servers and an internal platform layer, like application servers or cache servers
Between internal platform layer and a database
Benefits of Load Balancing
- Users experience uninterrupted service
- less downtime / higher throughput
- smart load balancers can use predictive analytics to predict bottlenecks
- fewer stressed / failing systems component; no one device is shouldering all the work
Least Connection Method
Load Balancing Algorithm
directs traffic to the server with the fewest active connections. useful for evening out distribution between servers
Least Bandwidth Method
Load Balancing Algorithm
directs traffic to the server with the least amount of traffic (Mbps)
Least Response Time Method
Load Balancing Algorithm
directs traffic to the server with the lowest average response time
Round Robin Method
Load Balancing Algorithm
cycles through a list of available servers and sends each new request to the next server. useful when the servers are of equal specification and there are not many persistent connections
Weighted Round Robin Method
Load Balancing Algorithm
Round robin method, but servers are assigned a weight and servers with higher weights get more new connections
IP Hash Method
Load Balancing Algorithm
A hash of the IP address of the client is calculated to redirect the request to a server
Useful for ensuring that requests from the same client go to the same server
Cache
A cache is like short-term memory for a system. It stores recently or accessed data for fast retrieval.
It’s often used to store responses to network requests or the results of computationally long operations
Caches can quickly return data without taxing downstream systems
Application Server Cache
cache placed directly on the request layer node
this enables the local storage of response data
each time a request is made to the service, the node will return local data if it exists, or fetch the data from disk
Global Cache
With a global cache, all distributed nodes use the same cache.
it’s very easy to overwhelm as number of clients and requests increase
Distributed Cache
The cache is divided up by using a consistent hashing function, and each of it’s nodes owns part of the cached data
CDN
Content Delivery Network OR
Content Distribution Network
CDNs are a kind of cache that we can use for serving large amounts of static media.
A request first asks the CDN for the media, and if it has it locally available, it will serve it. But it if it is not available, it will query backend servers, serve + cache the result
Cache Invalidation
Cache invalidation is maintenance that is required to keep the cache coherent with the source of truth (for example, a database)
If the data is modified in the database, it should be invalidated in the cache
Write-Through Cache
Cache-Writing Policy
data is written into the cache and corresponding database simultaneously
PROS: complete consistency between cache and storage. minimizes risk of data loss
CONS: higher latency (takes longer)
Write-Around Cache
Cache-Writing Policy
Data is written only to storage, bypassing cache
PROS: prevents cache from being flooded with write operations
CONS: a read request for recently-written data will be a cache miss (higher latency)
Write-Back Cache
Cache-Writing Policy
Data is written ONLY to the cache, and writes to storage are done after specified intervals (or under other conditions
PROS: low latency, high throughput for write-intensive applications
CONS: risk of data loss since the only copy of the data is the cache
Cache Invalidation Methods
Methods to invalidate a cache
PURGE - removes cached content for a specific object or URL, typically when there’s an update to that content
REFRESH - fetches requested content from origin server, even if available in cache, then updates the cache
BAN - invalidates cached content based on specific criteria such as URL pattern or header
Time to Live (TTL) - expires cached values after a certain amount of time
Stale While Revalidate - serves cached content immediately, then makes a request to update the cache with fresh content from the server
Read Through Cache
Cache Reading Strategy
if data is not in the cache, the cache makes the read request to the source, updates itself, and returns the data
ideal for scenarios where cache misses are infrequent
Read Around Cache
Cache Reading Strategy
if the data is not in the cache, the application makes the request to the source, then decides how / when to update the cache
Cache Eviction Policies
FIFO - first in, first out
LIFO - last in, first out
LRU - least recently used
MRU - most recently used
LFU - least frequently used
RR - random replacement