8 - Content Distribution Flashcards
Contents of an HTTP Request
Request line:
Method -> GET, POST, HEAD
URL -> /index.html
Version Number
Additional Headers:
Referrer -> what caused page to be requested
User Agent -> client software
Contents of an HTTP Response
HTTP Version
Response Code 100s - informational 200s - success -> 200 OK 300s - redirect -> 301 moved permanently 400 - errors -> 404 Not Found 500 - server error
Location
Server
Allow
Content-encoding
Content-length
Expires
Last-modified
Early HTTP
One Request/response per TCP connection \+ Simple to implement - TCP Connection for every request 3-way handshake slow start servers in TIME_WAIT
Solution: Persistent Connections
Persistent Connections
Multiple request/response on a single TCP connection
Delimiters indicate the ends of requests
Content-length
+ “Pipelining” -> Default in HTTP 1.1
Client sends requests as soon as it encounters a referenced object
Caching
Clients can cache documents in browser(local), in network
- Browser config, can open browser and point to a local cache
- Server directed, server can direct to a cache
Benefits of caching
Reduced transit costs for local ISP
Improved performance for local clients
Content Distribution Networks
Overlay network of web caches that is designed to deliver content to client from optimal location
Geographically disparate servers
Some CDNs are provided by content providers or networks/ISPs
Challenges in Running a CDN
Goal: Replicate content on many servers
How? Where? How to find? How to choose server replica? (server selection) How to direct clients? (content routing)
Server Selection
Which server?
Lowest load
Lowest latency (CDNs typically direct clients to servers with this)
Any “alive” server
Content Routing
How to direct clients to a server?
- Routing (eg, anycast) - simple but coarse
- Application-based (eg, HTTP redirect) - delays
- Naming-based (eg, DNS) -> most common way, fine-grained control and is fast
CDNs and ISPs
have a symbiotic relation ship
CDNs peer with ISPs \+ better throughput (lower latency) \+ redundancy \+ burstiness -> lower transit costs
ISPs Peer with CDNs
+ good performance for customers
+ lower transit costs
Bit Torrent
Peer-to-peer content distribution
File sharing
Large file distribution
Rather than having everyone fetching the content from the origin, have them fetch content from other peers.
We can take the original file and chop it into many different pieces and replicate different pieces on different peers in the network as soon as possible.
Idea is that each peer is assembling the file, but it’s assembling it by picking up different pieces of the file and then it can retrieve the pieces it doesnt have from the remaining peers in the network.
By trading different pieces of the same file, everyone eventually gets the full file.
Bit Torrent Publishing
- Peer creates “torrent”
Metadata
-> tracker
-> pieces of file - “Seeders” create initial copy
Client first contacts the tracker which provides the above metadata about the file, including a list of seeders that contain an initial copy of the file.
Next the client starts to download parts of the file from the seeder.
Once the client starts to accumulate some initial chunks, hopefully those chunks were different from those that other clients in the network that are also trading the file have.
At this point clients can begin to swap chunks
As clients begin swapping distinct chunks with one another, the idea is that eventually after enough swapping everyone gets a copy of the complete file.
Clients that contain incomplete copies of the file are called leechers.
The tracker allows peers to find each other and it also returns a random list of peers that any particular leecher can use to swap chunks of the file.
Previous p2p file-sharing systems used similar swapping techniques, but faced a problem called free-loading where by a client might leave the network as soon as it finished downloading a copy of the file, not providing any benefit to other clients who also want the file. (this is solved by bit torrent)
Solution to Freeriding
Bit Torrents solution to freeriding is called “choking”
Choking: temporary refusal to upload chunks to another peer that is requesting them
- > if a peer can’t download from a client, don’t upload to it
- > eliminates freerider problem
Getting Chunks to Swap
-> rarest piece first
Determine which pieces are most rare among clients -> download those first
This ensures that the most common pieces are left til the end to download and that a large variety of pieces are downloaded from the seeder
Additionally a client has nothing to trade and its important to get a complete piece asap.
Rare pieces are typically available at fewer peers initially, so downloading a rare piece is initially maybe not a good idea.
One policy that clients use is to select a random piece of the file and download it from the seeder.
End-game: client actively requests missing pieces from all peers and redundant requests are canceled when the missing piece arrives. This ensures that a single peer with a slow transfer rate doesn’t prevent the download rate from completing.