Latency, throughput, availability Flashcards

Question 1

Q

Latency

Answer

A

“how long it takes you to drive from A to B”

the amount of time in milliseconds (ms) it takes a single message to be delivered. Caused by:

Physical distance: the speed of light is the fastest anything can travel, so no matter how you design your system, transferring data through space will always take some time.
Complex computation: if a computation is complex, it’s going to take longer to execute and increase latency. For example, a complex relational database query with lots of joins will take longer than a simple lookup by id.
Congestion: when there are many message requests coming in at once and the system doesn’t have the capacity to process them all, some requests will have to wait, increasing latency. Either these requests will be dropped and sent again, or they’ll sit in a queue waiting to be processed.
Too many nodes: if there are too many decision points in the pathway of a request, it will increase latency because each node along the way adds time while processing the request and deciding where to route it.

Question 2

Q

How to improve latency

Answer

A

Better paths: minimizing the number of nodes a request has to travel through can help improve latencies.
Caching: caching can dramatically improve latencies when applied correctly, by storing a copy of repeatedly accessed data for faster retrieval.
Protocol choice - certain protocols, like HTTP/2, intentionally reduce the amount of protocol overhead associated with a request, and can keep latency lower. TCP also has congestion avoidance features that can help mitigate congestion-related causes of high latencies.

Question 3

Q

Throughput

Answer

A

“how many cars are on the road right now”

The amount of data that is successfully transmitted through a system in a certain amount of time, measured in bits per second (bps). Throughput is a measurement of how much is actually transmitted, and it is not the theoretical capacity (bandwidth) of the system

Question 4

Q

What causes low throughput

Answer

A

Congestion - just like road traffic is caused by many people trying to get to the same destination, low throughput in a software system can be caused by too many requests on the same network. Essentially, the hardware can’t handle the number of requests going through it.
Protocol overhead - if the protocols used in message transmission require handshakes and other back-and-forth communication patterns, the network can be overloaded with overhead from just the protocols and not the message content itself.
Latency - since throughput is the amount of data transmitted over a set time period, high latencies (i.e. slow data transmission speeds) will reduce the amount of data that is transmitted overall.

Question 5

Q

How to improve throughput

Answer

A

Increasing bandwidth - if you improve the capacity of a system to transport data (bandwidth), then the actual amount of data transferred (throughput) will increase too. This generally means adding new hardware or upgrading the hardware at bottlenecks in the system.
Improving latency - since latency limits throughput, improving latency can improve throughput.
Protocol choice - TCP has congestion avoidance features that can help mitigate congestion that causes low throughput.

Question 6

Q

Availability

Answer

A

the amount of time that a system is able to respond, that is the ratio of Uptime / (Uptime + Downtime). critical metric of performance for a service, because downtime can both harm users who rely on the systems and cause a business to lose large profits in a short amount of time.

gold standard for “highly available” systems is the five nines: 99.999% uptime

Question 7

Q

What causes low availability

Answer

A

Downtime happens when part of the service breaks, like a hardware component fails, or a deployment goes wrong so the software is inconsistent. Systems can fail in many ways, and here are some of the most common:

Hardware failure - computer components eventually fail, and this can take down a key server causing the whole system to stop working. A whole system can also be taken down if there’s a power outage or natural disaster at the data center.
Software bugs - when something in the code is wrong, a request can run into a bug and the code can be killed (e.g. a null pointer is dereferenced)
Complex architectures - the more complex a system design is, the more points of failure there are, and the harder it gets to synchronize more computers and make them fault tolerant to other computers in the system failing.
Dependent service outages - an outage of a service that the system relies on, like DNS or an authorization server, can cause the system to become unavailable even if nothing is broken internally.
Request overload - if a system reaches its maximum capacity, it can start failing to respond to some requests. Too many requests can also cause the computer to shut down if it runs out of a key resource and can’t process any more operations (e.g. the disk space fills up).
Deployment issues - when a deployment is conducted and the changes to software or configurations don’t go as expected, a number of problems can arise that could make the system unavailable. For example, deployment issues could put the servers in an inconsistent state, prevent them from starting, prevent them from talking to each other, or they might run short on resources.

Latency, throughput, availability Flashcards

(7 cards)