Lecture 5: 29th October 2019 Flashcards
Datacentres
What is a workflow?
An orchestrated and repeatable pattern of activity, enabled by the systematic organization of resources into processes that transform materials, provide services, or process information. It can be depicted as a sequence of operations, the work of a person or group, the work of an organization of staff, or one or more simple or complex mechanisms.
From a more abstract or higher-level perspective, workflow may be considered a view or representation of real work. The flow being described may refer to a document, service, or product that is being transferred from one step to another.
Workflows may be viewed as one fundamental building block to be combined with other parts of an organization’s structure such as information technology, teams, projects, and hierarchies.
The coordinated execution of multiple tasks or activities.
What is a workload?
The amount of work that a computer or computer system has been given to do at a given time.
How can workloads be measured?
- log everything all the time? : generally expensive and infeasible
- log with sampling? : samples may miss events of interest - outliers are important in networking
- replay with logging turned on? : completely overlooks “Heisenbugs”
Where could we measure workloads?
Nowhere really at the scale of datacentres of large companies. At 10GB/s and with 84 byte packets, you have ~ 70 ns to process each packet. CPU operations are on the level of 10s of ns, and having to include packet I&O, context switching, packet classification, etc make it infeasible
What is the scale of traffic inside datacentres?
Big. Google had a 50x growth in traffic between 2008 and 2014. In 2015, Facebook web servers had 100s to 1000s of simultaneous connections, but their traffic within datacentres is several times larger than that which goes out into the Internet.
Is there any relationship between the rate of packet drops and utilisation in datacentres?
No (because of incast)
What is a top-of-rack?
A switch on top a rack of servers which connects them to each other and to other servers
What is a server rack?
A framework cage that contains a number of specialised servers which slide into bays like shelves. The servers are commodity hardware.
What are containers?
The normal atomic unit at which servers are bought for datacentres. They are groups of server racks, usually to the size of a shipping container - and sometimes in a shipping container.
When do containers get replaced?
When ~ 10 of their machines fail
What are containers aka?
blocks
What is locality?
The degree to which network traffic does not travel far topologically, i.e. staying within the same server rack vs container or datacentre vs the wider Internet.
What locality properties do servers in datacentres hold?
most traffic from a block within its block cluster, and a fifth of that within the same rack. outside its cluster, 12% within its own DC and 18% outside of it.
What do the proprieties of datacentre traffic depend on?
the function of application; scale; network topology; protocols used
What are the implications of datacentre networking?
large internal traffic; tight deadlines for (network) I/O; congestion and TCP incast must be prevented; networks are complex and shared by diff apps; centralised control at per-flow level hard
What are the pros and cons of using few larger data centres versus using more smaller data centres?
fewer and larger is less management complexity and cost overall but more per site; higher latency with fewer large centres; app complexity greater with few large DCs; may need a hierarchical cache structure for progressively authoritative DCs; more multiplexing with fewer large DCs as all connecting to 1 place (?)
What is the biggest design choice within datacentres? Why is it important?
How to connect racks together; need to allow rack and machine-wise addressing and routing and maximise performance.
What is bisection bandwidth?
The minimum amount of capacity required to be cut in links to bisect (partition into halves) a network. Make the minimum number of cuts needed to separate the two partitions and then sum the bandwidth of the links cut. it represents the bandwidth available between the two partitions, and, thereby, the true bandwidth available in the entire system.
What is the big switch approach?
The idea of connecting racks (ToRSes) with “bigger” switches.
What are some problems with the big switch approach?
Presents single points of failure (mitigate this by duplication -> cable management) as well as scaling issues.