Test 2 L8 Flashcards

Question 1

Q

The Web and Caching

Answer

A

Protocol: Hypertext Transfer Protocol (HTTP)

application layer protocol to transfer web content
protocol browser uses to request webpages
protocol to return objects to browser
layered on top of byte stream protocol like TP

Server is stateless

Question 2

Q

HTTP Requests

Answer

A

Request line

indicates method of request (GET, POST sends data to server, HEAD returns headers of get request)
includes URL
includes version number

Optional headers

referrer - what caused page to be requested
user agent -client software

Question 3

Q

HTTP Response

Answer

A

Status line

HTTP version
response code indicating outcomes (100s information, 200s success, 300s redirect, 400s errors, 500s server errors)
Server
Lcoation
Allow
Content encoding
Content length
Expires
Modified

Question 4

Q

Early HTTP

Answer

A

v.0.9/1.9

One request/response per TCP Connection

Advantage: simple to implement

Disadvantage: TCP connection for every request -overhead, slowing transfer,

TCP three way handshake for every request
TCP must start in slow start every time connection opens
servers have many connections that are forced to keep TCP connections in time-wait states until timers expire, so reserve additional resources even after connections completed

Solution to increase efficiency: persistent connections

Question 5

Q

Persistent Connections

Answer

A

Multiple HTTP requests/responses are multiplexed on a single TCP connection

-Delimiters indicate end of requests
Content length allows receive to identify length of response
—So server must know size of transfer in advance

Question 6

Q

Pipelining

Answer

A

Client sends the next request as soon as it encounters a referenced object

So as little as one RTT for all referenced objects before they begin to be fetched

Persistent connections with pipelining was default in HTTP 1.1

Question 7

Q

Caching

Answer

A

Improves performance

Origin web server hosts content but is far away
TCP throughput is inversely proportional to RTT
So further away that web content is, the slower the web page will load because latency is bigger and throughput is lower
Solution: fetch content from nearby location that has been cached
Also benefit when multiple clients requesting same content because local clients see less latency/more throughput and ISP doesn’t have to pay to keep transferring content over expensive links

Can occur in multiple places

Browser locally
Network (ISP, content distribution networks)

Caches expire to ensure client see the most recent version of a page

based on the expire setter
cache can also check with origin server to check whether original content modified (304 not modified response)

Ways to redirect clients to caches:

browser configuration to explicitly configure browser to point to local cache so all HTTP request first directed to local cache before request is forwarded to origin
Origin server or service hosting the content direct browser to cache
-Done by special reply to DNS request to ping potential cache servers

Question 8

Q

TCP Throughput is inversely proportional to…

Question 9

Q

Locations of caches

Answer

A

Can occur in multiple places

Browser locally
Network (ISP, content distribution networks)

Question 10

Q

Ways to redirect clients to caches

Answer

A

Ways to redirect clients to caches:

browser configuration to explicitly configure browser to point to local cache so all HTTP request first directed to local cache before request is forwarded to origin
Origin server or service hosting the content direct browser to cache
-Done by special reply to DNS request to ping potential cache servers

Question 11

Q

Benefits of caching

Answer

A

Reduced transit costs for ISP

Improved performance for clients

Question 12

Q

Web Content Distribution Networks (CDN) defined

Answer

A

Overlay network of web caches designed to deliver content to a client from optimal location
-Optimal can be geographically closest or something else.

CDNs often have geographically disparate servers

Purpose to place caches as close to users as possible

Owned by content providers (google), networks (At&t) or independently (Akami)

Nonnetwork CDNs typically place servers in other autonomous systems or ISPs

The number of cache nodes vary.

Question 13

Q

How server selection works in CDN

Answer

A

Which server criteria

least loaded server
lowest latency (most typical)
any alive server

Question 14

Q

Content Routing: How clients get redirected to the right server in CDN

Answer

A

How to direct clients
1-Routing systems (e.g. anycast): number all replicas with same IP address then allow routing to take them to closes replica
–simple but servers given very little control since at whims of Internet routing (simple but coarse)

2-Application-based (e.g. HTTP redirect): requires client to first go to origin server to get redirect, increasing latency (simple but delays)

3-Naming system (e.g. DNS): client looks up domain and response contains IP address of nearby cache. Significant flexibility in directing different clients to different server replicas (fine-grained control and fast)

Question 15

Q

CDNs ++ ISPs

Answer

A

CDNs and ISPs have symbiotic relationship when it comes to peering relationships

CDNs like to peer with ISP because peering directing with ISP where customer located because

better throughput since no intermediate AS hops and network latency lower
redundancy: more vectors to deliver content increases reliability
burstiness: during large request events, having connectivity to multiple networks where content is hosted allows ISP to spread traffic across multiple transit links thereby reducing 95th percentile and lowering transit costs

ISPs like peering with CDNs (or host caches locally) because:

providing content closer to ISP customers allows ISP to provide good performance for customers
lower transit costs because traffic isn’t traversing costly links

Question 16

Q

Bit Torrent

Answer

A

Peer-to-Peer CDN used for file sharing and distribution of large files

Fetch content from peers rather than origin

reduce congestion and -
prevent overload at network where content hosted

Break original file into many pieces, replicate different pieces on different peers as soon as possible so each peer assembles file by picking up different pieces and get remaining pieces from other peers. Eventually assemble entire files

Question 17

Q

Bit Torrent Publishing

Answer

A

Peer creates torrent
- tracker metadata
- pieces of file
Seeders create initial copy (has complete copy)
- tracker has metadata about file including list of seeders that contain initial copy of file
client starts to download pieces of file from seeder.
- hopefully different from other peers
Clients swap chunks of the file and after enough swapping, they should all have complete files.

Leechers: clients with incomplete copies of the file.

Trackers: allows peers to find each other and returns random list of peers leechers can use to swap parts of the file

Freeloading: client leaves network as soon as finished downloading file. Solved by bit torrent

Question 18

Q

Solution to freeriding

Answer

A

Bit torrent solved freeriding (clients leaving network as soon as finished downloading file) with choking

Choking (tit-for-tat) temporary refusal to upload chunks to another peer
-if peer can’t download from peer, don’t upload to it
(eliminated freerider problem)

Question 19

Q

Bit torrent getting chunks to swap

Answer

A

If all client receive same chunks, no one has complet copy and clients won’t swap

Rarest piece first: client determines which pieces are rarest and download those first

so most common pieces left to the end and larger variety of pieces downloaded from seeder
to begin, randomly download because rare piece not a good idea initially
redundant requests deleted when piece received so single peer with slow transfer rate doesn’t slow down the network

Question 20

Q

Distributed Hash Tables

Answer

A

Enable a form of content overlay called a structured overlay

Question 21

Q

Chord

Answer

A

Distributed hash table

Scalable, distributed lookup service (maps keys to values)

Scalable
Provable correctness
Reasonably good performance

Main motivation is scalable location of data in a large distributed system

Key problem is lookup: hash table isn’t located in one place, distributed across network (DHT distributed hash table)

Question 22

Q

Consistent Hashing

Answer

A

keys and nodes map to same ID space

Create metric space (like a ring) with nodes. Nodes have ID and key maps to ID space. Consistent hash function will assign the keys and nodes and ID in this space. Hash function used to assign identifiers. Nodes hash of IP address, keys hash of keys.

In chord, key stored at successor, which is node with next highest ID.

Consistent hashing provides:

load balance because all nodes receive roughly same number of keys
flexibility because when node leaves or joins network, only a small fraction of keys needs to be moved

Question 23

Q

Implementing consistent hashing

Answer

A

Every node knows location of every other nodes

lookups are fast on the order of 1
routing table must be large because every node must know location of every other node, order N (number of nodes in network)

Each node only knows location of immediate sucessor

small table, size order 1
requires order N lookups

Finger tables: every node knows location of N other nodes, distance of nodes that it knows increases exponentially
-finger i points to the successor of n+2i
-find predecessor for a particular ID and ask what is the successor of that ID
finger tables have mapping of predecessor too. Then ask that predecessor for its successor, moves around ring looking for node whose successor’s id is bigger than id of data
-require order of long(n) hopes
-order logn messages per lookup
-size of finger table is order of log n state per node
-when nodes joins network, initialize fingers of node and update fingers of existing nodes, transfer keys from successor to new node

-when a node leaves, any particular node keeps track of fingers of successor so predecessor can reach nodes corresponding to failed/left node’s fingerlings

Question 24

Q

Finger tables

Answer

A

(from wiki)
To avoid the linear search above, Chord implements a faster search method by requiring each node to keep a finger table containing up to m entries, recall that m is the number of bits in the hash key.

The i^{th} entry of node n will contain:
successor((n+2^{i-1]) mod 2^m)

The first entry of finger table is actually the node’s immediate successor (and therefore an extra successor field is not needed). Every time a node wants to look up a key k, it will pass the query to the closest successor or predecessor (depending on the finger table) of k in its finger table (the “largest” one on the circle whose ID is smaller than k), until a node finds out the key is stored in its immediate successor.

With such a finger table, the number of nodes that must be contacted to find a successor in an N-node network is O(\log N)

Question 25

Q

Finger table node join

Answer

A

Whenever a new node joins, three invariants should be maintained (the first two ensure correctness and the last one keeps querying fast):

1 Each node’s successor points to its immediate successor correctly.
2 Each key is stored in successor(k).
3 Each node’s finger table should be correct.

To satisfy these invariants, a predecessor field is maintained for each node. As the successor is the first entry of the finger table, we do not need to maintain this field separately any more. The following tasks should be done for a newly joined node n:

1 Initialize node n (the predecessor and the finger table).
2 Notify other nodes to update their predecessors and finger tables.
3 The new node takes over its responsible keys from its successor