Web & HTTP Flashcards

1
Q

Web and HTTP

A
  • web pages consist of objects
  • An object can be HTML file, JPEG image, Java applet, audio file, …
  • web page consists of base HTML-file which includes several referenced objects
  • Each object is addressable by a URL, e.g.,
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

HTTP overview
HyperText Transfer Protocol

A
  • Web’s application layer protocol
  • client/server model
    ¤ client: browser that requests, receives, (using HTTP protocol) and “displays” Web objects
    ¤ server: Web server sends (using HTTP protocol) objects in response to requests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

HTTP overview (continued)
(Underlying protocol, HTTP state, Protocol state)

A
  • Underlying protocol: TCP
    • Client initiates TCP connection (creates socket) to server, port 80
    • Server accepts TCP connection from client
    • HTTP messages (application-layer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server)
    • TCP connection closed
  • HTTP is “stateless”
    • Server maintains no information about past client requests
  • Protocols that maintain “state” are complex!
    • Past history (state) must be maintained
    • If server/client crashes, their views of “state” may be inconsistent, -> must be reconciled
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

HTTP connections

A

a. non-persistent HTTP
- At most one object sent over TCP connection
- Connection then closed
- Downloading multiple objects required multiple connections

b. persistent HTTP
- Multiple objects can be sent over single TCP connection between client, server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A web page can consist of many different files

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Median number of objects embedded in a webpage

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Non-Persistent HTTP

A

The “classic” approach in HTTP/1.0 is to use one HTTP request per TCP connection, serially.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Non-persistent HTTP: response time

A
  • RTT: time for a packet to travel from client to server and back
  • HTTP response time:
    • one RTT to initiate TCP connection
    • one RTT for HTTP request and first few bytes of HTTP response to return
      • This assumes HTTP GET piggy backed on the
    • File transmission time
    • Non-persistent HTTP response time =
      2RTT + file transmission time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Persistent HTTP

A

non-persistent HTTP issues:
- requires 2 RTTs per object
- OS overhead for each TCP connection
- browsers often open parallel TCP connections to fetch referenced objects

persistent HTTP:
- server leaves connection open after sending response
- subsequent HTTP messages between same client/server sent over open connection
- client sends requests as soon as it encounters a referenced object
- as little as one RTT for all the referenced objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Persistent HTTP 2

A
  • The “persistent HTTP” approach can re-use the same TCP connection for multiple HTTP transfers, one after another, serially.
  • Amortizes TCP overhead, but maintains TCP state longer at server
  • The “pipelining” feature in HTTP/1.1 allows requests to be issued asynchronously on a persistent connection. Requests must be processed in proper order. Can do clever packaging
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

HTTP request message

A
  • Two types of HTTP messages: request, response
  • HTTP request message:
    • ASCII (human-readable format)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

HTTP request message: general format

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Uploading form input

A

POST method:
- web page often includes form input
- input is uploaded to server in entity body

URL method:
- uses GET method
- input is uploaded in URL field of request line:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

HTTP response message

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

HTTP response status codes

A
  • Status code appears in 1st line in server-to-client response message.
  • Some sample codes:
    200 OK
    ¤ request succeeded, requested object later in this msg
    301 Moved Permanently
    ¤ requested object moved, new location specified later in this msg (Location:)
    400 Bad Request
    ¤ request msg not understood by server
    404 Not Found
    ¤ requested document not found on this server
    505 HTTP Version Not Supported
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

User-server state: cookies

A
  • Cookie == a small file with data (up to 4KB)
    ¤ Sent to the Web browser by the Web server
    ¤ Saved locally inside the browser
    ¤ Sent back by the browser in all subsequent requests
  • Example:
    ¤ Susan always access Internet from PC
    ¤ visits specific e-commerce site for first time
    ¤ when initial HTTP requests arrives at site, site creates:
    • unique ID
    • entry in backend database for ID
17
Q

Four components of cookies

A

1) Cookie header line of HTTP response message
2) Cookie header line in next HTTP request message
3) Cookie file kept on user’s host, managed by user’s browser
4) Back-end database at Web site

18
Q

Cookies: keeping “state” (cont.)

A
19
Q

Cookies (continued)

A
  • What cookies can be used for:
    • Authorization
    • Shopping carts
    • Recommendations
    • User session state (Web e-mail)
  • How to keep “state”
    • Protocol endpoints: maintain state at sender/receiver over multiple transactions
    • Cookies: HTTP messages carry state
  • Cookies and privacy:
    • Cookies permits sites to learn a lot about you
    • You may supply name and e-mail to sites
20
Q

Cookies + Third Parties

A
21
Q

How it works

A
22
Q

50% of the websites are using more than 100 3rd party cookies

A
23
Q

What can we do about it?

A
  • Different ad block products (block cookies/connections to third party sites)
    • Ghostery, Ad Block etc.
  • Doesn’t completely solve the problem…
    • Trackers getting smarter. Use browser features to fingerprint
    • E.g., combination of installed extensions/fonts etc.
      • Surprisingly unique!
24
Q

Evolution of Serving Web Content

A

¨ In the beginning…
¤ …there was a single server
¤ Probably located in a closet
¤ And it probably served blinking text

¨ Issues with this model
¤ Site reliability
n Unplugging cable, hardware failure, natural disaster
¤ Scalability
n Flash crowds (aka Slashdotting)

25
Q

Replicated Web service

A
  • Device that multiplexes requests across a collection of servers
    ¤ All servers share one public IP
    ¤ Balancer transparently directs requests to different servers

¨ How should the balancer assign clients to servers?
¤ Random / round-robin
n When is this a good idea?
¤ Load-based
n When might this fail?

¨ Challenges
¤ Scalability (must support traffic for n hosts)
¤ State (must keep track of previous decisions)
n RESTful APIs reduce this limitation

26
Q

Load balancing: Are we done?

A

¨ Advantages
¤ Allows scaling of hardware independent of IPs
¤ Relatively easy to maintain

¨ Disadvantages
¤ Expensive
¤ Still a single point of failure
¤ Location!

27
Q

Popping up: HTTP performance

A

¨ For Web pages
¤ RTT matters most
¤ Where should the server go?

¨ For video
¤ Available bandwidth matters most
¤ Where should the server go?

¨ Is there one location that is best for everyone?

¨ Impact on user experience
¤ Users navigating away from pages
¤ Video startup delay

¨ Impact on revenue
¤ Amazon: increased revenue 1% for every
100ms reduction in page load time (PLT)
¤ Shopzilla:12% increase in revenue by
reducing PLT from 6 seconds to 1.2
seconds

¨ Ping from LON to NYC: ~100ms

28
Q

Web caches (proxy server)

A
  • Goal: satisfy client request without involving origin server
  • User sets browser: Web accesses via cache
  • Browser sends all HTTP requests to cache
    ¤ Object in cache: cache returns object
    ¤ Else cache requests object from origin server, then returns object to client
29
Q

More about Web caching

A
  • Cache acts as both client and server
    ¤ Server for original requesting client
    ¤ Client to origin server
  • Typically cache is installed by ISP (university, company, residential ISP)
  • Why Web caching?
    ¤ Reduce response time for client request
    ¤ Reduce traffic on an institution’s access link
    ¤ Internet dense with caches: enables “ poor ” content providers to effectively deliver content (so too does P2P file sharing)
30
Q

Caching example

A

Scenario:
- access link rate: 1.54 Mbps
- RTT from institutional router to server: 2 sec
- web object size: 100K bits
- average request rate from browsers to origin servers: 15/sec
- avg data rate to browsers: 1.50 Mbps

Performance:
- access link utilization = .97
- LAN utilization: .0015
- end-end delay = Internet delay + access link delay + LAN delay = 2 sec + minutes + usecs

31
Q

Option 1: buy a faster access link

A
32
Q

Option 2: install a web cache

A
33
Q

Calculating access link utilization, end-end delay with cache

A
34
Q

Conditional GET

A
  • Goal: don’t send object if cache has up-to-date cached version
    ¤ No object transmission delay
    ¤ Lower link utilization
  • Cache: specify date of cached copy in HTTP request
    ¤ If-modified-since: <date></date>
  • Server: response contains no object if cached copy is upto-date:
    ¤ HTTP/1.0 304 Not Modified
35
Q

HTTP/2

A

Key goal: decreased delay in multi-object HTTP requests

  • HTTP1.1: introduced multiple, pipelined GETs over single TCP connection
    • server responds in-order (FCFS: first-come-first-served scheduling) to GET requests
    • with FCFS, small object may have to wait for transmission (head-ofline (HOL) blocking) behind large object(s)
    • loss recovery (retransmitting lost TCP segments) stalls object transmission
  • HTTP/2: [RFC 7540, 2015] increased flexibility at server in sending objects to client:
    • methods, status codes, most header fields unchanged from HTTP1.1
    • transmission order of requested objects based on client-specified object priority (not necessarily FCFS)
    • push unrequested objects to client
    • divide objects into frames, schedule frames to mitigate HOL blocking
36
Q

HTTP/2: mitigating HOL blocking

A
37
Q

HTTP/2 to HTTP/3

A
  • HTTP/2 over single TCP connection means:
    • recovery from packet loss still stalls all object transmissions
      • as in HTTP 1.1, browsers have incentive to open multiple parallel TCP connections to reduce stalling, increase overall throughput
    • no security over vanilla TCP connection
    • HTTP/3: adds security, per object error- and congestion-control (more pipelining) over UDP
      • more on HTTP/3 in transport layer