8 - Load Balancing Flashcards
Where might load balancing apply?
Master-worker parallel computation
Traffic management
Collision avoidance
Challenges of load balancing
Procedure complexity
Outdated info
Fault Tolerance
Scalability
Heterogeneous hardware - different speeds/capacity
Static load balancing
Doesn’t consider state
All set in advance
Static: Attributes
Do not change
Tasks needed to be regular
Round robin algorithm
Assign next ball to next bin then start again
Weighted round robin
A larger bin is given more balls/jobs
Round Robin General problem
State not being considered means that jobs with different durations/temporarily slow servers affect it.
Simple Randomised
For each ball, choose a random bin. No state info
Simple randomised: Examples
Client side load balancing - randomly choose ip
Hash tables - hash value can be used
Hash function with n buckets
Simple using modulo eg h(ID) = ID mod n
30 for 6 buckets = 0
11 = 5
Random vs Hash
Random cannot find same item again
Hashing; item can be found knowing ID and hash function
Dynamic Load Balancer
Based on current state.
Can move tasks where necessary
Allows greater modularity
Least loaded Bin
Server side balancer forwards request to backends.
Least time allocation
Allocation to bin that will serve at earliest time. Considers speed and queues
Multiple Dispatcher Problem
They will send their request to the same server (herd behaviour) and has opposite effect on performance.
Goren et al Multi dispatch solution
Round based algorithm. Allocate to bins based on probabilities in current round.
Need to know server queues, processing rates and job arrival rates
Dynamic Randomised
Choose multiple random bins, allocate the ball to the bin of lowest load
Power of two choices improvements
In terms of m balls and n bins
Case m = n: Exp improvement
Case m > n: Max load above avg is independent of num balls m
When does multiple-choice lose some power?
In parallel settings
Parallel Scenario
Balls arrive in batches of n
Allocated using GREEDY[d] but bin loads only updated between batches.
Decisions may be based on outdated load info
g-bounded process
Allocated using two choices, but if load difference is g or less then process makes mistake.
Gap between average load in parallel, 1 + β process and g-bounded scenario
Independent of m (number of balls)
1 + β process
(balls into bins)
Some balls allocated using 2 choices, the rest using 1.
2 bins queries, and with probability 1 - β, ball is allocated to random bin. 1+ β choice for each ball
Probability that none of m = n balls fall into particular bin
1/e
Expected number of empty bins
n/e
This is greatly simplified process
Collision
Any two or more balls falling into one bin
Bernoulli Process (probability that ball falls into bin and doesn’t)
Ball falls into bin: 1/n
Ball does not fall into bin: 1 - 1/n
Probability of at least k balls
(e/k)^k
Maximum load in any bin
At most (3ln(n))/ln(ln(n))
High probability is 1 - o(n)
Distributed hash table
Set of nodes that store items
Chord
Nodes organsied in logical ring with N nodes
Scalable key location run time
O(log(N))
Every shortcut cuts distance at least in half.
Chord: Adding a Node x
Let y successor of x.
Move items at y that belong to x to x
Update finger tables
Chord: Removing a node
(Move items to successor)
Let y be successor of x
Move items of x to y
Update finger tables
Chord: Advantages
Scalable
Small routing tables
Fully decentralised
Is a chord well load-balanced?
No - different arc lengths.
Eg node 28 is assigned everything from 22 to 28. 21 only items hashed to 21
Load balancing a Chord
Map every node to r cycle points
r arcs per node = distribution is more even