Back of the napkin math Flashcards
L1 cache reference time
0.5 ns
Branch mispredict time
5 ns
L2 cache reference
7 ns
Mutex lock/unlock time
100 ns
Main memory reference time
100 ns
Compress 1K bytes with Zippy time
10,000 ns
Send 2K bytes over 1 Gbps network time
20,000 ns
Read 1 MB sequentially from memory time
250,000 ns
Round trip within same datacenter time
500,000 ns
Disk seek time
10,000,000 ns
Read 1 MB sequentially from network time
10,000,000 ns
Read 1 MB sequentially from disk time
30,000,000 ns
Send packet CA->Netherlands->CA time
150,000,000 ns
Which is more expensive and why: read or write?
Writes are 40 times more expensive than reads. Locking causes writes to become sequential which greatly reduces throughput. Additionally, synchronization of data is expensive when there are updates that need to be propogated.
What observation can we make from these estimates?
Global shared data is expensive. This is a fundamental limitation of distributed systems. The lock contention in shared heavily written objects kills performance as transactions become serialized and slow.
What should architecture be built around: read or writes? Why?
Architect for scaling writes. Writes are going to be the bottleneck in any architecture so making them be as parallel as possible will create a high capacity architecture.
What should writes be optimized around?
Optimize for low write contention and for as many parallel writes as possible.
What is BUD Optimization? And where do you apply it?
Bottlenecks, Unnecessary work, Duplicate work. You use this after you have come up with an initial solution to a problem.