System Performance Flashcards
What are the main performance metrics used?
Capacity: Consistent measure of a service’s size or amount of resources
Utilization: Percentage of that resource used for a workload
Overhead: Percentage of that utilization used for bookkeeping
Useful Work: Percentage of that utilization used for what we actually need to do
Throughput: Number of operations conducted per unit of time
Latency: Response time
Regarding latency and throughput, which metric is more important? What are the requirements for each?
It is application-dependent. The requirements will drive the application design.
What are some of the factors that can impose a limitation on the performance of the system, that are not dependent on the users nor the application?
Physics: (e.g. speed of light limits how fast signals can travel from one end of the chip to
another).
Economics: (e.g. we can’t throw infinite money into the problem)
Technologic: (e.g. we are dependent on the current technology offer, since each technology generation eventually hits a wall)
How can we identify a system’s performance limitations, for instance, latency or throughput?
By decomposing the system into its constituents, like a pipeline, and identifying the bottlenecks. Depending on the requirements we can target specific stages over others.
In a system, if we have opportunities to improve the system’s throughput or the system’s latency, which should we choose?
It is impossible to know because it depends on the application’s requirements.
What strategy(-ies)/technique(s) do we have to reduce latency in a request processing pipeline?
Exploit the common case by adding a cache.
Exploit request properties by running independent stages at the same time -> Sometimes this doesn’t improve latency but improves throughput while latency remains nearly the same.
What strategy(-ies)/technique(s) do we have to hide latency in a request processing pipeline?
Instead of taking actions to reduce latency, we can hide it by improving the throughput instead. For example, give each stage the ability to process multiple requests at the same time. Each request takes the same time (or slightly more) but we can process more requests at the same time.
What is the difference between a concurrent and a parallel design?
Concurrency: the ability to make progress on more than one task at the same time (i.e. concurrently)
Parallelism: Ability to make progress on a task by splitting it into multiple subtasks that can be processed at the same time (i.e. in parallel)
What strategy(-ies) are there to improve throughput in a request processing pipeline?
It is possible to run stages concurrently or in parallel.
If a stage operates at full capacity, we can apply a queueing strategy. The queue size will depend on the workload, and It can be designed to answer short or longer overload bursts.
When we apply a queueing strategy but the system is overloaded for big periods of time, what solutions can be applied?
Increase the capacity of the system if possible (within the feasible limits). Or shed load, reducing offered load, or limit offered load.
In what consists of the technique Load Shedding?
Load Shedding is a technique used to fight the long-term overload of the system. This can be achieved by reducing or limiting the load of the system.
Reducing load: refusing to serve some requests by exploring workload properties.
Limiting load: add bounded buffers between stages, starting from the bottleneck stage, cascading until the beginning of the pipeline, if needed.
What are the studied bottleneck removal techniques?
Exploit workload properties
Concurrency
Queuing
What are the studied bottleneck fighting techniques?
Batching
Dallying
Speculation
Describe batching technique.
Group (batch) several requests into a single request. The cost of sending the request is amortized over the batch size.
Latency increases proportionally to the size of the batch.
Throughput, until a certain point, increases proportionally to the amortized cost.
If the batch size is very large, the system will change between periods of overload and periods of idleness.
What kind of optimizations can be done when using batching?
If we have multiple writes on the same object, we can drop all except the last one, decreasing utilization. This depends on the workload properties.
For mediums with sequential accesses or high locality, we can proceed to a reordering of the requests.