L5 - Autoscaling 1/2 Flashcards
Why an elastic application?
- reduce over/under-provisioning
- reduce cost + increase customer satisfaction
What are 4 typical resources applications use?
- CPU
- Memory
- Disk
- Network
Dynamism for desktop apps on the laptop
seconds, thread scheduling
Dynamism for HPC with a cluster as a shared resource
hours, days for job scheduling
Dynamism for banking with mainframe
periodically every day for processor allocation
Dynamism for web and server clusters
highly dynamic, limited predictability
What is increased throughput?
Ability to handle more workload (requests) in the same time
What is decreased latency?
Individual requests are handled faster
Can we normally decrease latency or increase throughput for web-applications?
Normally we can only increase throughput
Is there a scalability limit for throughput?
Yes, the curve converges to a certain limit in the long-run
Why is there a scalability limit for throughput?
- overhead with parallelization
- bottleneck: initiation of parallelization is a sequential process –> at a certain point, the sequential part dominates the execution (Amdahl’s law)
- shared databases limit the load that can be processed
- programming influences whether applications can scale
What is scalability of applications?
Characteristics of an application to increase its capacity (throughput)
What does the capacity of an application depend on?
- available resource capacities
- application design (whether the app is programmed for scalability)
What are scalability limits?
- maximum application capacity
- throughput can be limited by max resource capacities or application design
What happens when applications with poor scalability are scaled?
- significant drop in efficiency
What is speedup?
performace (p processors) / performance (1 processor)
e.g. for CPU the transactions per second
efficiency
efficiency (p processors) = speedup (p processors) / p
Does speedup linearly scale?
No. With one processor the efficiency = 1 but then the efficiency drops
What is parallel computing?
Where many processors work simultaneously to produce exceptional computational power and to significantly reduce the total computational time.
What is elasticity?
- dynamic adaptation of the capacity to a change in the workload
- no shutdown/restart required
- shrink capacity, if workload decreases
- increase capacity, if workload increases
What is autoscaling?
Cloud computing feature that enables organizations to scale cloud services such as server capacities or VMs up or down automatically, based on defined situations such as traffic or utilization levels.
What is a backend service?
Needed to answer requests that arrive at the frontend
What is vertical scaling - scaling up?
Scale the server on which the service is running.
You can increase the capacity of a single service instance by increasing its resources:
- increase CPU time percentage
- increase clock frequency
- add more cores
- replace existing resources with more powerful ones
Pros of vertical scaling
- easy to replace a resource with a more powerful one
- it does not require a re-design of the application
Cons of vertical scaling
- more powerful resources might be too expensive
- resource capacity is limited
- replacement of resources cause service interruption
What is horizontal scaling/ scaling out?
- capacity increase of service by creating more instances (assumption = each service instance comes with its own resources)
What are the pros of horizontal scaling?
- no requirement for more powerful hardware
- provides a long term solution for scaling
What are the cons of horizontal scaling?
- increased amount of resources comes with more management overhead
- horizontal scaling requires a distributed software architecture
What is an auto-scaler?
System that defines how many servers (resources) are provided to the application. The monitor (e.g. cloud watch) measures metrics from servers which are then provided to the auto-scaler.
What is the autoscaling policy about?
The autoscaling system uses this to adapt the amount of resources
3 autoscaling approaches
- Reactive
- Scheduled
- Predictive
What is reactive autoscaling?
- detect under/overloaded service
- scale in/out or down/up according to policy
What is scheduled autoscaling?
-policy specifies scaling events (time-stamped scaling actions)
- apply scaling actions at appropriate time
What is predictive autoscaling?
- continuously predict future workloads
- if workloads will change, schedule scaling actions ahead in time
- lets you circumvent scaling latency and enables more time consuming scaling decisions
Two types of auto-scalers
- resource centric
- service centric
What is a resource-centric auto-scaler?
- scaling actions modify resources
- services are implicitly adapted
What is a service-centric auto-scaler?
- scaling actions modify the number of service instances
- resources are implicitly adapted
What is AWS reactive autoscaling?
resource centric, scaling the number of VMs
What is the AWS Auto Scaling Group?
- set of VMs with same launch template
- contains a collection of EC2 instances (virtual servers) that are treated as a logical grouping for the purpose of automatic scaling and management.
- ## optionally have a load balancer to scale out by creating more instances of the launch template
AWS Scaling Policies
- target tracking scaling
- simple scaling
- step scaling
What is target tracking scaling?
- automatically adjust resources to meet target
What is simple scaling?
- trigger based on: metric, threshold, condition
e.g. metric > threshold: - #VMs
e.g. we want a CPU load of 50% if it is higher we scale out and if it goes below 50% we scale in. (you increase by a fixed number of #VMs or a fixed percentage once the threshold is passed).
What is step scaling?
- depends on amount of breach
specify metric, threshold, steps based on amount
0 to 10%: 0%
10 to 20%: 10%
20 to infinity%: 30%
0 to minus infinity%: 10%