1.2 Latency Numbers Flashcards
What are latency numbers, and why are they important in system design?
Latency numbers represent the time it takes for a system to complete a specific operation, such as accessing memory, making a network request, or reading data from a disk. They are crucial in system design because they help developers understand the performance characteristics of different components. For example, knowing that accessing CPU registers is much faster than accessing main memory can guide decisions about optimizing code or choosing the right data storage solution. While exact numbers may change over time, understanding the relative differences (e.g., how much slower disk access is compared to memory access) is more important for designing efficient systems.
What is the significance of the sub-nanosecond range in latency?
The sub-nanosecond range (less than 1 nanosecond) is the fastest latency range in computing. It includes operations like accessing CPU registers and completing a CPU clock cycle. CPU registers are tiny, ultra-fast storage locations inside the CPU, and accessing them is extremely quick. A modern CPU’s clock cycle (the time it takes to execute a basic operation) is also in this range. These operations are so fast that they form the foundation of all computing tasks. However, because CPU registers are limited in number, they are used sparingly for the most critical tasks.
What happens in the 1 to 10 nanosecond latency range?
In the 1 to 10 nanosecond range, we find operations like accessing L1 and L2 caches. These are small, fast memory caches located close to the CPU. Accessing them is slower than accessing CPU registers but still much faster than accessing main memory. This range also includes penalties for branch mispredictions, which occur when the CPU guesses the wrong path in a conditional statement (e.g., an if-else block). A branch misprediction can cost up to 20 CPU clock cycles, which is still within this range.
What is the latency of accessing main memory (RAM)?
Accessing main memory (RAM) typically falls in the 10 to 100 nanosecond range. For example, on a modern processor like the Apple M1, accessing RAM is at the slower end of this range. This means that accessing RAM is hundreds of times slower than accessing CPU registers or L1/L2 caches. This difference highlights why optimizing code to minimize memory access is important for performance.
What is the cost of a system call in terms of latency?
A system call (e.g., requesting a service from the operating system kernel) typically takes several hundred nanoseconds on Linux. This latency accounts for the time it takes to switch from user mode to kernel mode and back (a process called a trap). However, this does not include the time it takes to execute the system call itself. System calls are relatively expensive compared to CPU operations, so minimizing their use can improve performance.
What is the latency of a context switch between threads?
A context switch (switching from one thread to another) takes at least a few microseconds on Linux. This is the best-case scenario. If the new thread requires loading data from memory (e.g., paging), the latency can increase significantly. Context switches are necessary for multitasking but can become a bottleneck in high-performance systems.
What is the latency of reading data from an SSD?
Reading an 8K page from an SSD typically takes about 100 microseconds. SSDs are much faster than traditional hard drives because they have no moving parts. However, SSD write latency is about 10 times slower than read latency, taking close to 1 millisecond to write a page. This difference is due to the way SSDs handle data writes.
What is the latency of a network round trip within a cloud zone?
A network round trip within the same cloud zone (e.g., between servers in the same data center) typically takes a few hundred microseconds. In the 2020s, this latency has improved and can sometimes be less than 100 microseconds. This low latency is crucial for distributed systems, where frequent communication between servers is required.
What is the latency of a Memcache or Redis operation?
A typical Memcache or Redis get operation takes about 1 millisecond as measured by the client. This includes the time for the network round trip and the time to retrieve the data from the cache. Caching systems like Redis are designed to provide fast access to frequently used data, reducing the need to query slower databases.
What is the latency of a hard disk drive (HDD) seek?
The seek time of a hard disk drive (HDD) is about 5 milliseconds. This is the time it takes for the disk’s read/write head to move to the correct position on the disk. HDDs are much slower than SSDs because they rely on mechanical parts, which introduce latency.
What is the latency of a network round trip between the US and Europe?
A network round trip between the US East Coast and Europe typically takes 10 to 100 milliseconds. This latency is due to the physical distance between the two locations and the speed of light, which limits how fast data can travel. Such latencies are important to consider when designing globally distributed systems.
What is the latency of a bcrypt password hash?
Hashing a password using bcrypt takes about 300 milliseconds. This is intentionally slow to make brute-force password cracking impractical. While this latency is high compared to other operations, it is necessary for security purposes.
What is the latency of transferring 1GB of data within a cloud region?
Transferring 1GB of data within the same cloud region takes about 10 seconds. This latency depends on network bandwidth and congestion. For large data transfers, optimizing network usage and compression can help reduce the effective latency.