Resource management II - Week 5 Flashcards
Bag of tasks
Term commonly used for independent tasks that can be scheduled on any machine
Bag of tasks - common approaches
Shortest job first
- When choosing one task from many it may be unfair to long jobs
When we have heterogeneous resources:
- Max-min: Maximum duration on the machine giving minimum completion time
- Min-min: Minimum duration task on the machine giving minimum completion time (minimum completion time should be interpreted as early completion time)
Resource giving earliest completion time is a good rule of thumb when we have resources with different characteristics (aka heterogeneous resources)
When we have tasks with deadlines the task with the earliest deadline should be scheduled first.
Not all tasks are bag of tasks, e.g. tasks with dependencies
Tasks with dependencies
Dependencies impose an order of execution on tasks
Divide and Conquer on tasks with dependencies
Tasks can be partitioned so that parallel tasks each operate on different data.
See the graph on the “Not all application are a bag of words” slide in “Resource Management II - Week 5”
Task Graphs
Also known as Directed acyclic graphs or, more recently, scientific workflows.
These techniques are a way of choosing the number of resources and an approach to schedule tasks onto said resources so that overall completion time is minimised.
Applications in astronomy, earth sciences, bioinformatics, etc.. may consist of 1000s of tasks.
Heterogenous Earliest Finish Time (HEFT)
Produces a schedule for the execution of a DAG (directed acyclic graph) on a given number of resources aiming to minimise overall completion time while meeting task dependencies.
Is a heuristic, finds local optimums but local optimums does not imply global optimality.
Key idea: give priority to the critical path
Scheduling trade offs
The higher the number of machines we use the shorter execution time should be but utilisation drops and after a certain point additional machines may not improve completion time
HEFT is a heuristic: different prioritisation schemes may lead to different outcomes. Which one do we choose?
The more time we spend in scheduling (algorithm complexity) the better the schedule is supposed to be
Many assumptions inherent. What if communication is more likely to delay than computation, what if we know nothing about either? We also assume execution cost is known.
Cloud computing trade offs
Cloud user
Buy more resources (at risk of paying more) or less (at a risk of some failure to meet expectations). Deciding is not trivial, requires a lot of computation.
Cloud Provider
- Balance commitments and demand with the number of machines that are on? Switching machines off reduces energy consumption
- Spot instances - Offer excess capacity with big discounts
Also security, social and legal challenges to be considered
Cloud to Fog / Edge computing
Essentially a continuum: Cloud-fog-edge computing
Cloud computing data centres are a sort of central operating infrastructure to handle data.
Nowadays, data is often produced from internet devices. We don’t need to send all of that to a cloud server. We can use edge and fog computing instead
Edge computing
Try to do some processing at the edge, near the source of data (reduces the amount of data sent to the cloud)
Fog computing
Some extra layers between edges (that produce data) and central cloud servers