Web & HTTP Flashcards

1
Q

Web and HTTP

A
  • web pages consist of objects
  • An object can be HTML file, JPEG image, Java applet, audio file, …
  • web page consists of base HTML-file which includes several referenced objects
  • Each object is addressable by a URL, e.g.,
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

HTTP overview
HyperText Transfer Protocol

A
  • Web’s application layer protocol
  • client/server model
    ¤ client: browser that requests, receives, (using HTTP protocol) and “displays” Web objects
    ¤ server: Web server sends (using HTTP protocol) objects in response to requests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

HTTP overview (continued)
(Underlying protocol, HTTP state, Protocol state)

A
  • Underlying protocol: TCP
    • Client initiates TCP connection (creates socket) to server, port 80
    • Server accepts TCP connection from client
    • HTTP messages (application-layer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server)
    • TCP connection closed
  • HTTP is “stateless”
    • Server maintains no information about past client requests
  • Protocols that maintain “state” are complex!
    • Past history (state) must be maintained
    • If server/client crashes, their views of “state” may be inconsistent, -> must be reconciled
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

HTTP connections

A

a. non-persistent HTTP
- At most one object sent over TCP connection
- Connection then closed
- Downloading multiple objects required multiple connections

b. persistent HTTP
- Multiple objects can be sent over single TCP connection between client, server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A web page can consist of many different files

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Median number of objects embedded in a webpage

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Non-Persistent HTTP

A

The “classic” approach in HTTP/1.0 is to use one HTTP request per TCP connection, serially.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Non-persistent HTTP: response time

A
  • RTT: time for a packet to travel from client to server and back
  • HTTP response time:
    • one RTT to initiate TCP connection
    • one RTT for HTTP request and first few bytes of HTTP response to return
      • This assumes HTTP GET piggy backed on the
    • File transmission time
    • Non-persistent HTTP response time =
      2RTT + file transmission time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Persistent HTTP

A

non-persistent HTTP issues:
- requires 2 RTTs per object
- OS overhead for each TCP connection
- browsers often open parallel TCP connections to fetch referenced objects

persistent HTTP:
- server leaves connection open after sending response
- subsequent HTTP messages between same client/server sent over open connection
- client sends requests as soon as it encounters a referenced object
- as little as one RTT for all the referenced objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Persistent HTTP 2

A
  • The “persistent HTTP” approach can re-use the same TCP connection for multiple HTTP transfers, one after another, serially.
  • Amortizes TCP overhead, but maintains TCP state longer at server
  • The “pipelining” feature in HTTP/1.1 allows requests to be issued asynchronously on a persistent connection. Requests must be processed in proper order. Can do clever packaging
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

HTTP request message

A
  • Two types of HTTP messages: request, response
  • HTTP request message:
    • ASCII (human-readable format)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

HTTP request message: general format

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Uploading form input

A

POST method:
- web page often includes form input
- input is uploaded to server in entity body

URL method:
- uses GET method
- input is uploaded in URL field of request line:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

HTTP response message

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

HTTP response status codes

A
  • Status code appears in 1st line in server-to-client response message.
  • Some sample codes:
    200 OK
    ¤ request succeeded, requested object later in this msg
    301 Moved Permanently
    ¤ requested object moved, new location specified later in this msg (Location:)
    400 Bad Request
    ¤ request msg not understood by server
    404 Not Found
    ¤ requested document not found on this server
    505 HTTP Version Not Supported
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

User-server state: cookies

A
  • Cookie == a small file with data (up to 4KB)
    ¤ Sent to the Web browser by the Web server
    ¤ Saved locally inside the browser
    ¤ Sent back by the browser in all subsequent requests
  • Example:
    ¤ Susan always access Internet from PC
    ¤ visits specific e-commerce site for first time
    ¤ when initial HTTP requests arrives at site, site creates:
    • unique ID
    • entry in backend database for ID
17
Q

Four components of cookies

A

1) Cookie header line of HTTP response message
2) Cookie header line in next HTTP request message
3) Cookie file kept on user’s host, managed by user’s browser
4) Back-end database at Web site

18
Q

Cookies: keeping “state” (cont.)

19
Q

Cookies (continued)

A
  • What cookies can be used for:
    • Authorization
    • Shopping carts
    • Recommendations
    • User session state (Web e-mail)
  • How to keep “state”
    • Protocol endpoints: maintain state at sender/receiver over multiple transactions
    • Cookies: HTTP messages carry state
  • Cookies and privacy:
    • Cookies permits sites to learn a lot about you
    • You may supply name and e-mail to sites
20
Q

Cookies + Third Parties

21
Q

How it works

22
Q

50% of the websites are using more than 100 3rd party cookies

23
Q

What can we do about it?

A
  • Different ad block products (block cookies/connections to third party sites)
    • Ghostery, Ad Block etc.
  • Doesn’t completely solve the problem…
    • Trackers getting smarter. Use browser features to fingerprint
    • E.g., combination of installed extensions/fonts etc.
      • Surprisingly unique!
24
Q

Evolution of Serving Web Content

A

¨ In the beginning…
¤ …there was a single server
¤ Probably located in a closet
¤ And it probably served blinking text

¨ Issues with this model
¤ Site reliability
n Unplugging cable, hardware failure, natural disaster
¤ Scalability
n Flash crowds (aka Slashdotting)

25
Replicated Web service
- Device that multiplexes requests across a collection of servers ¤ All servers share one public IP ¤ Balancer transparently directs requests to different servers ¨ How should the balancer assign clients to servers? ¤ Random / round-robin n When is this a good idea? ¤ Load-based n When might this fail? ¨ Challenges ¤ Scalability (must support traffic for n hosts) ¤ State (must keep track of previous decisions) n RESTful APIs reduce this limitation
26
Load balancing: Are we done?
¨ Advantages ¤ Allows scaling of hardware independent of IPs ¤ Relatively easy to maintain ¨ Disadvantages ¤ Expensive ¤ Still a single point of failure ¤ Location!
27
Popping up: HTTP performance
¨ For Web pages ¤ RTT matters most ¤ Where should the server go? ¨ For video ¤ Available bandwidth matters most ¤ Where should the server go? ¨ Is there one location that is best for everyone? ¨ Impact on user experience ¤ Users navigating away from pages ¤ Video startup delay ¨ Impact on revenue ¤ Amazon: increased revenue 1% for every 100ms reduction in page load time (PLT) ¤ Shopzilla:12% increase in revenue by reducing PLT from 6 seconds to 1.2 seconds ¨ Ping from LON to NYC: ~100ms
28
Web caches (proxy server)
- Goal: satisfy client request without involving origin server - User sets browser: Web accesses via cache - Browser sends all HTTP requests to cache ¤ Object in cache: cache returns object ¤ Else cache requests object from origin server, then returns object to client
29
More about Web caching
- Cache acts as both client and server ¤ Server for original requesting client ¤ Client to origin server - Typically cache is installed by ISP (university, company, residential ISP) - Why Web caching? ¤ Reduce response time for client request ¤ Reduce traffic on an institution’s access link ¤ Internet dense with caches: enables “ poor ” content providers to effectively deliver content (so too does P2P file sharing)
30
Caching example
Scenario: - access link rate: 1.54 Mbps - RTT from institutional router to server: 2 sec - web object size: 100K bits - average request rate from browsers to origin servers: 15/sec - avg data rate to browsers: 1.50 Mbps Performance: - access link utilization = .97 - LAN utilization: .0015 - end-end delay = Internet delay + access link delay + LAN delay = 2 sec + minutes + usecs
31
Option 1: buy a faster access link
32
Option 2: install a web cache
33
Calculating access link utilization, end-end delay with cache
34
Conditional GET
- Goal: don’t send object if cache has up-to-date cached version ¤ No object transmission delay ¤ Lower link utilization - Cache: specify date of cached copy in HTTP request ¤ If-modified-since: - Server: response contains no object if cached copy is upto-date: ¤ HTTP/1.0 304 Not Modified
35
HTTP/2
Key goal: decreased delay in multi-object HTTP requests - HTTP1.1: introduced multiple, pipelined GETs over single TCP connection - server responds in-order (FCFS: first-come-first-served scheduling) to GET requests - with FCFS, small object may have to wait for transmission (head-ofline (HOL) blocking) behind large object(s) - loss recovery (retransmitting lost TCP segments) stalls object transmission - HTTP/2: [RFC 7540, 2015] increased flexibility at server in sending objects to client: - methods, status codes, most header fields unchanged from HTTP1.1 - transmission order of requested objects based on client-specified object priority (not necessarily FCFS) - push unrequested objects to client - divide objects into frames, schedule frames to mitigate HOL blocking
36
HTTP/2: mitigating HOL blocking
37
HTTP/2 to HTTP/3
- HTTP/2 over single TCP connection means: - recovery from packet loss still stalls all object transmissions * as in HTTP 1.1, browsers have incentive to open multiple parallel TCP connections to reduce stalling, increase overall throughput - no security over vanilla TCP connection - HTTP/3: adds security, per object error- and congestion-control (more pipelining) over UDP * more on HTTP/3 in transport layer