Datacenters Flashcards

Question 1

Q

how ccan workloads be measured in the network? 3 with downsides of each?

Answer

A

Log everything all the time?
Generally expensive and infeasible
Log with sampling?
Samples may miss events of interest
Replay with logging turned on?
Completely overlooks “Heisenbugs”

Question 2

Q

what does a rack look like

Answer

A

a TOR switch is connected to the rack and acts as the entry point to the servers on the rack. The tor switch is connected to other fabric switches

Question 3

Q

diff ways to connect racks? 4

Answer

A

big swtich sux cause single point of failure/congestion
lots of big switches increases complexity
tree yay but back to single point at the top
fat tree yay
k-port switches connected to k k-port switches connected to k machines
low cost because of commodity switches
increased throughput betwen racks, any disticnt pair of hosts has full bisection bandwidth
redundant connections

Question 4

Q

what is bisection bandwidth

Answer

A

In computer networking, if the network is bisected into two partitions, the bisection bandwidth of a network topology is the bandwidth available between the two partitions.[1] Bisection should be done in such a way that the bandwidth between two partitions is minimum.[2] Bisection bandwidth gives the true bandwidth available in the entire system. Bisection bandwidth accounts for the bottleneck bandwidth of the entire network. Therefore bisection bandwidth represents bandwidth characteristics of the network better than any other metric.

Question 5

Q

how might u calculate throughput upperbound

Answer

A

throughput per flow ≤

total capacity
_____________
# of flows * mean path length

Question 6

Q

how does tcp REALLY look

Answer

A

Congestion Window is the number of inflight bytes awaiting ack. is always less than the rwnd (dictated by flow control) init value of 10MSS
slow start until timeout
on timeout, SSTHRESH = 1/2 CW
slow start again UNTIL CW > SSTHRESH
at which point it just becomes linear increase

Question 7

Q

what is the tcp incast problem. preconditions for it?

Answer

A

lots of servers using tcp all simultaneously request data , many to one requests to a switch that can overflow switch buffers - causes drastic reduction in throughput

RTT in data centre is «&laquo_space;RTO so application may be idle for a relatively long time

Preconditions for TCP Incast
• High-bandwidth, low-latency networks
• with small switch buffers (as it should be)
• Concurrent barrier-synchronized requests
• nothing happens until all responses received
• Servers returning a relatively small amount of data
per request

Imbalance between low link latency (µs) and RTO (ms)

Question 8

Q

how can we solve the tcp incast problem

Answer

A

modify the tcp used to remove the RTO minimum

BUT this isn’t good enough by itself, it’ll still lead to drop in throughput because the datacentres RTTS are ««&laquo_space;TCP’s clock granularity

so we use a microsecond RTO rather than a millisecond one

then we gucci

Question 9

Q

how do mice vs elephants differ, and affect DCtcp

Answer

A

mice are < 1MB (micebyte :3)
• (query, control state,
advertising/bidding) etc
- delay sensitive!

Large ‘Elephant’ flows:
1MB à100sMB
(backups, updates)
throughput-sensitive flows

most flows are mice, but most bytes com from elephants

Question 10

Q

what is ECN

Answer

A

explicit congestion notification
ECN allows end-to-end notification of network congestion without dropping packets

an ECN-aware router may set a mark in the IP header instead of dropping a packet in order to signal impending congestion. The receiver of the packet echoes the congestion indication to the sender, which reduces its transmission rate as if it detected a dropped packet.

Two Key Ideas

React in proportion to the extent of congestion, not its presence.
Mark based on instantaneous queue length.

Datacenters Flashcards

(10 cards)