Content Distribution Flashcards
HTTP Request Line
Method (verb, POST HEAD etc.), URL, Version
Common HTTP headers
Referrer, User Agent
HTTP response status line
httpversion, response code, location , server, Allow, content-encoding, content-length, last modified
HTTP 100s
Information
HTTP 200s
Success
HTTP 300s
Move, redirect
HTTP 400s
Error 404 file not found
HTTP 500s
Server Error
How many TCP connection for request and response in early HTTP
1
Problem with Multiple short lived TCP connections
3-way handshake, slow start, servers in TIME_WAIT
Solution for the One TCP connection problem
Persistent Connections
What are presistent connections
Multiple request/response on a single TCP connection (multiplexed), Delimiters indicate the end of request, Content length
TIME_WAIT
he timeout in TIME_WAIT is just an amount of time after which we can safely assume that if the other end didn’t send anything, then it’s because he received the final ACK and closed the connection.
What is pipelining in persistent Connections
Client send request as soon as it encounters a reference object
Where can Caching occure
Client can cache, networks can cache and servers can cache
CDN
Content Distribution network
How does 304 relate to cache?
stands for NOT MODIFIED. The client should use the locally cached version.
Can DNS return near by servers
Yes, dig show this info
What is CDN
Overlay network of caches (Deliver content to client from optimal location) Geographically disparate servers
Challenges in running a CDN
How?, Where?, HOw to find? How to choose server? how to direct clients?
Goal of CDN
replicate content on many servers
Server Selection
Which server? Lowest Load, Lowest Latency, Any “alive” servers
Out of server selection which option is normally choosen
Lowest Latency
Content routing
How to direct clients to a server
Types of Content Routing
Routing (e.g. anycast (course)), application based (http redirect (delays)), Naming-based (eg DNS (fase))
Benefits CDNs peer w/ISPs
better throughput (lower latency), redundancy, burstiness (lower transit cost)
ISP peer w/CDNs
good performance for customers, lower transit costs
Bittorrent
file sharing, large file distribution, fetch content from peers, file split among peers
Seeders
Have a full copy of the file “Create the initial copies”
leechers
Only have partial copy of the file
freeloading
Disconnecting after the file has finished downloading
Bittorrent solution for freeloading
Chocking
Chocking
temporary refusal to upload chunks. If can’t download from a peer don’t upload to it. Eliminates freeloading problem
How does bittorrent ensure chunks get swapped out
Rarest Piece first
What s Rarest Piece first
Determine which pieces are most rare among clients download those first
Since rarest piece may not be available initially what is used
Random piece first
What is bittorrents end game?
Actively request missing pieces from all peers
Distributed hash table is a form of?
Structured content overlay
Types of distributed hash tables
CHORD consistent hashing
CHORD
scalable distributed “lookup service” Key->value
Benefits of chord
Scalable, Provable correctness, performance
Chord Motivation
Scalable location of data in a large distributed system
What is the key problem of chord?
key lookup
Main idea of Consistent Hashing
Keys and nodes map to same ID space
Two options when implementing consistent hashing
Option1: Every node know location of every other node. Lookups: O(1). Tables: O(N). Option2: Node knows only successor Table O(1), lookups O(n)
chords solution to the key lookup problem
Finger table. (Read the chord paper)