Datacenters Flashcards

1
Q

how ccan workloads be measured in the network? 3 with downsides of each?

A
  • Log everything all the time?
  • Generally expensive and infeasible
  • Log with sampling?
  • Samples may miss events of interest
  • Replay with logging turned on?
  • Completely overlooks “Heisenbugs”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what does a rack look like

A

a TOR switch is connected to the rack and acts as the entry point to the servers on the rack. The tor switch is connected to other fabric switches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

diff ways to connect racks? 4

A

big swtich sux cause single point of failure/congestion
lots of big switches increases complexity
tree yay but back to single point at the top
fat tree yay
k-port switches connected to k k-port switches connected to k machines
low cost because of commodity switches
increased throughput betwen racks, any disticnt pair of hosts has full bisection bandwidth
redundant connections

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is bisection bandwidth

A

In computer networking, if the network is bisected into two partitions, the bisection bandwidth of a network topology is the bandwidth available between the two partitions.[1] Bisection should be done in such a way that the bandwidth between two partitions is minimum.[2] Bisection bandwidth gives the true bandwidth available in the entire system. Bisection bandwidth accounts for the bottleneck bandwidth of the entire network. Therefore bisection bandwidth represents bandwidth characteristics of the network better than any other metric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how might u calculate throughput upperbound

A

throughput per flow ≤

total capacity
_____________
# of flows * mean path length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how does tcp REALLY look

A

Congestion Window is the number of inflight bytes awaiting ack. is always less than the rwnd (dictated by flow control) init value of 10MSS
slow start until timeout
on timeout, SSTHRESH = 1/2 CW
slow start again UNTIL CW > SSTHRESH
at which point it just becomes linear increase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the tcp incast problem. preconditions for it?

A

lots of servers using tcp all simultaneously request data , many to one requests to a switch that can overflow switch buffers - causes drastic reduction in throughput

RTT in data centre is «&laquo_space;RTO so application may be idle for a relatively long time

Preconditions for TCP Incast
• High-bandwidth, low-latency networks
• with small switch buffers (as it should be)
• Concurrent barrier-synchronized requests
• nothing happens until all responses received
• Servers returning a relatively small amount of data
per request

Imbalance between low link latency (µs) and RTO (ms)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how can we solve the tcp incast problem

A

modify the tcp used to remove the RTO minimum

BUT this isn’t good enough by itself, it’ll still lead to drop in throughput because the datacentres RTTS are ««&laquo_space;TCP’s clock granularity

so we use a microsecond RTO rather than a millisecond one

then we gucci

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how do mice vs elephants differ, and affect DCtcp

A

mice are < 1MB (micebyte :3)
• (query, control state,
advertising/bidding) etc
- delay sensitive!

  • Large ‘Elephant’ flows:
  • 1MB à100sMB
  • (backups, updates)
  • throughput-sensitive flows

most flows are mice, but most bytes com from elephants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is ECN

A

explicit congestion notification
ECN allows end-to-end notification of network congestion without dropping packets

an ECN-aware router may set a mark in the IP header instead of dropping a packet in order to signal impending congestion. The receiver of the packet echoes the congestion indication to the sender, which reduces its transmission rate as if it detected a dropped packet.

Two Key Ideas

  1. React in proportion to the extent of congestion, not its presence.
  2. Mark based on instantaneous queue length.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly