8 - Content Distribution Flashcards

1
Q

Contents of an HTTP Request

A

Request line:
Method -> GET, POST, HEAD
URL -> /index.html
Version Number

Additional Headers:
Referrer -> what caused page to be requested
User Agent -> client software

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Contents of an HTTP Response

A

HTTP Version

Response Code 
   100s - informational
   200s - success
      -> 200 OK
   300s - redirect
      -> 301 moved permanently
   400 - errors
      -> 404 Not Found
   500 - server error

Location

Server

Allow

Content-encoding

Content-length

Expires

Last-modified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Early HTTP

A
One Request/response per TCP connection
   \+ Simple to implement
   - TCP Connection for every request
      3-way handshake
      slow start
      servers in TIME_WAIT

Solution: Persistent Connections

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Persistent Connections

A

Multiple request/response on a single TCP connection
Delimiters indicate the ends of requests
Content-length

+ “Pipelining” -> Default in HTTP 1.1
Client sends requests as soon as it encounters a referenced object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Caching

A

Clients can cache documents in browser(local), in network

  1. Browser config, can open browser and point to a local cache
  2. Server directed, server can direct to a cache
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Benefits of caching

A

Reduced transit costs for local ISP

Improved performance for local clients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Content Distribution Networks

A

Overlay network of web caches that is designed to deliver content to client from optimal location

Geographically disparate servers

Some CDNs are provided by content providers or networks/ISPs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Challenges in Running a CDN

A

Goal: Replicate content on many servers

How?
Where?
How to find?
How to choose server replica? (server selection)
How to direct clients? (content routing)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Server Selection

A

Which server?
Lowest load
Lowest latency (CDNs typically direct clients to servers with this)
Any “alive” server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Content Routing

A

How to direct clients to a server?

  1. Routing (eg, anycast) - simple but coarse
  2. Application-based (eg, HTTP redirect) - delays
  3. Naming-based (eg, DNS) -> most common way, fine-grained control and is fast
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

CDNs and ISPs

A

have a symbiotic relation ship

CDNs peer with ISPs
\+ better throughput
   (lower latency)
\+ redundancy
\+ burstiness -> lower transit costs

ISPs Peer with CDNs
+ good performance for customers
+ lower transit costs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Bit Torrent

A

Peer-to-peer content distribution
File sharing
Large file distribution

Rather than having everyone fetching the content from the origin, have them fetch content from other peers.

We can take the original file and chop it into many different pieces and replicate different pieces on different peers in the network as soon as possible.

Idea is that each peer is assembling the file, but it’s assembling it by picking up different pieces of the file and then it can retrieve the pieces it doesnt have from the remaining peers in the network.

By trading different pieces of the same file, everyone eventually gets the full file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Bit Torrent Publishing

A
  1. Peer creates “torrent”
    Metadata
    -> tracker
    -> pieces of file
  2. “Seeders” create initial copy

Client first contacts the tracker which provides the above metadata about the file, including a list of seeders that contain an initial copy of the file.

Next the client starts to download parts of the file from the seeder.

Once the client starts to accumulate some initial chunks, hopefully those chunks were different from those that other clients in the network that are also trading the file have.

At this point clients can begin to swap chunks

As clients begin swapping distinct chunks with one another, the idea is that eventually after enough swapping everyone gets a copy of the complete file.

Clients that contain incomplete copies of the file are called leechers.

The tracker allows peers to find each other and it also returns a random list of peers that any particular leecher can use to swap chunks of the file.

Previous p2p file-sharing systems used similar swapping techniques, but faced a problem called free-loading where by a client might leave the network as soon as it finished downloading a copy of the file, not providing any benefit to other clients who also want the file. (this is solved by bit torrent)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Solution to Freeriding

A

Bit Torrents solution to freeriding is called “choking”

Choking: temporary refusal to upload chunks to another peer that is requesting them

  • > if a peer can’t download from a client, don’t upload to it
  • > eliminates freerider problem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Getting Chunks to Swap

A

-> rarest piece first

Determine which pieces are most rare among clients -> download those first

This ensures that the most common pieces are left til the end to download and that a large variety of pieces are downloaded from the seeder

Additionally a client has nothing to trade and its important to get a complete piece asap.

Rare pieces are typically available at fewer peers initially, so downloading a rare piece is initially maybe not a good idea.

One policy that clients use is to select a random piece of the file and download it from the seeder.

End-game: client actively requests missing pieces from all peers and redundant requests are canceled when the missing piece arrives. This ensures that a single peer with a slow transfer rate doesn’t prevent the download rate from completing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Distributed Hash Tables

A

Chord: scalable, distributed “lookup service”, maps keys to values (eg, DNS, directories)

+ scalability
+ provable correctness
+ performance

17
Q

Chord: Motivation

A

Scalable location of data in a large distributed system

Publisher might want to publish a particular piece of data such as an mp4 named “Annie Hall” (key=”Annie Hall”, value=mp4) so that when the client performs a lookup of “Annie Hall” it is directed to the right location that is hosting the data.

Key problem to solve is LOOKUP using a hash table.

The hash table is distributed across the network

We are trying to build a distributed hash table (DHT)

18
Q

Consistent Hashing

A

Main Idea: Keys and nodes map to same ID space

Hash function assigns IDs

Node: hash(IP)
Key: hash(key)

Chord: key stored at successor

Consistent hashing
+ Load balance
+ Flexibility

19
Q

Options for Implementing Consistent Hashing

A

Option: Every node knows location of every other node
Lookups: O(1) +
Tables: O(N) -

Option: Node knows only successor
Lookups: O(N) -
Tables: O(1) +

Solution for best of both worlds: Finger Table

20
Q

Finger Tables

A

Every node knows m other nodes in the ring

The distance of the nodes that it knows increases exponentially

Lookups: O(logN)
Table: O(logN)

results in efficient lookups

What happens when nodes join and leave the network?

If a new node joins, initialize fingers of new node, then update fingers of existing nodes so they know that they can point to the node with the new node. Finally transfer the keys from the successor to the new node.