Performance Flashcards
what is latency?
times it takes to send and receive a request (amount of time it takes for the customer to order)
what is throughput?
amount of data that can be successfully processed in a certain time (amount of time it takes to server a specific number of customers in a certain amount of time)
given different stages in a server what is the latency of a server?
sum of all stages
given different stages in a server what is the throughput of a server?
min of throughput of stages
what is utilization?
percentage of capacity that is being used
what is overhead w/ examples?
the amount of resources that is “wasted” (usually is the use at lower-layers ex: allocation virtualization, operating systems)
what is useful work?
resources spent on actual work
(usually the use at the current layer in a system ie the application)
what is the formula for overhead as percentage?
overhead = capacity - useful work
percentage = overhead / useful work
bottleneck
the component that restricts throughput
if you’re optimizing for throughput what should you focus on?
bottle neck
if you’re optimizing for latency what should you focus on?
the component with the highest latency
what is the number one rule for optimization and why?
dont prematurely optimize because we are bad at predicting performance
what is amdah’s law?
1/((1-p) + (p/s))
what is the best case speedup?
lim s->inf = 1/(1- p)
what are the three solutions for latency?
fast-path, parallelize, speculation
what is fast-path?
reduce latency for some requests
how do we parallelize (concurrency) to help with latency?
run independent steps in parallel (if one step does a totally different step then another step than you can run them both at the same time)
what is speculation and what does it cost?
predict what work might be done, if the prediction is correct latency goes down
- trading off work for latency, if the system is working near capacity you might be trading off throughput for latency
what are the three solution for throughput?
batching, dallying, concurrency or pipelining
what is batching?
grouping multiple tasks or inputs together and processing them as a single batch instead of one by one
what is dallying?
delaying a request, removing unnecessary waiting
what is pipelining?
give each stage its own threads/memory
connect stages using bounded buffers
what is the working set?
its the last k number of distinct items that were accessed
( subset of data that is currently relevant to the program’s execution)
what is the working set hypothesis?
for some number k the actual working set is much smaller than k