Performance Flashcards by Thomas Fujimoto

what is latency?

times it takes to send and receive a request (amount of time it takes for the customer to order)

How well did you know this?

Not at all

Perfectly

what is throughput?

amount of data that can be successfully processed in a certain time (amount of time it takes to server a specific number of customers in a certain amount of time)

How well did you know this?

Not at all

Perfectly

given different stages in a server what is the latency of a server?

sum of all stages

How well did you know this?

Not at all

Perfectly

given different stages in a server what is the throughput of a server?

min of throughput of stages

How well did you know this?

Not at all

Perfectly

what is utilization?

percentage of capacity that is being used

How well did you know this?

Not at all

Perfectly

what is overhead w/ examples?

the amount of resources that is “wasted” (usually is the use at lower-layers ex: allocation virtualization, operating systems)

How well did you know this?

Not at all

Perfectly

what is useful work?

resources spent on actual work
(usually the use at the current layer in a system ie the application)

How well did you know this?

Not at all

Perfectly

what is the formula for overhead as percentage?

overhead = capacity - useful work
percentage = overhead / useful work

How well did you know this?

Not at all

Perfectly

bottleneck

the component that restricts throughput

How well did you know this?

Not at all

Perfectly

if you’re optimizing for throughput what should you focus on?

bottle neck

How well did you know this?

Not at all

Perfectly

if you’re optimizing for latency what should you focus on?

the component with the highest latency

How well did you know this?

Not at all

Perfectly

what is the number one rule for optimization and why?

dont prematurely optimize because we are bad at predicting performance

How well did you know this?

Not at all

Perfectly

what is amdah’s law?

1/((1-p) + (p/s))

How well did you know this?

Not at all

Perfectly

what is the best case speedup?

lim s->inf = 1/(1- p)

How well did you know this?

Not at all

Perfectly

what are the three solutions for latency?

fast-path, parallelize, speculation

How well did you know this?

Not at all

Perfectly

what is fast-path?

reduce latency for some requests

how do we parallelize (concurrency) to help with latency?

run independent steps in parallel (if one step does a totally different step then another step than you can run them both at the same time)

what is speculation and what does it cost?

predict what work might be done, if the prediction is correct latency goes down
- trading off work for latency, if the system is working near capacity you might be trading off throughput for latency

what are the three solution for throughput?

batching, dallying, concurrency or pipelining

what is batching?

grouping multiple tasks or inputs together and processing them as a single batch instead of one by one

what is dallying?

delaying a request, removing unnecessary waiting

what is pipelining?

give each stage its own threads/memory
connect stages using bounded buffers

what is the working set?

its the last k number of distinct items that were accessed
( subset of data that is currently relevant to the program’s execution)

what is the working set hypothesis?

for some number k the actual working set is much smaller than k

what is temporal locality?

recently accessed items will be accessed again

what is spatial locality?

items closed to recently accessed items will be accessed soon

what are the 3 different associativity's in caches and rank them from most to least expensive?

--- most expensive --- 1. fully associative : each item can go in any slot 2. n way associative : each item can go in n locations 3. direct mapped : each item can go in one slot --- least expensive ---

what are the two mechanisms for removing an item form the cache?

1. write-back cache 2. write through cache

what is write back cache?

updates to the cache are not immediately reflected in the memory

what is write through cache?

updates to the cache are immediately reflected in the memory

what are the three policies for removing items from the cache?

FIFO, LRU, Clock

explain FIFO

track count of actions and when each item was added

explain LRU

- the least recently used item is removed - tracks the age of each item - age is rest to zero when accessed - evicts the item with the highest age

explain the clock algorithm

is like a clock with a hand ticking around. Each time an item is accessed, it's as if the clock hand is ticking past that item. When the cache is full and a new item needs to be added, the clock hand "checks" each item. If an item has been recently accessed (referenced), it is spared. If it hasn't been accessed since the last check, it is evicted, making room for the new item.

what does belady's anomaly state?

a larger cache does not always result in better performance

what is a compulsory miss?

program has never accessed the data (how many unique items have been introduced?)

what is capacity miss?

the program is actively using more data than a cache can hold (how many misses would a fully associative cache of my size have)

conflict misses

a program has seen the data but it was evicted by something in that data's set