Web & HTTP Flashcards
Web and HTTP
- web pages consist of objects
- An object can be HTML file, JPEG image, Java applet, audio file, …
- web page consists of base HTML-file which includes several referenced objects
- Each object is addressable by a URL, e.g.,
HTTP overview
HyperText Transfer Protocol
- Web’s application layer protocol
- client/server model
¤ client: browser that requests, receives, (using HTTP protocol) and “displays” Web objects
¤ server: Web server sends (using HTTP protocol) objects in response to requests
HTTP overview (continued)
(Underlying protocol, HTTP state, Protocol state)
- Underlying protocol: TCP
- Client initiates TCP connection (creates socket) to server, port 80
- Server accepts TCP connection from client
- HTTP messages (application-layer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server)
- TCP connection closed
- HTTP is “stateless”
- Server maintains no information about past client requests
- Protocols that maintain “state” are complex!
- Past history (state) must be maintained
- If server/client crashes, their views of “state” may be inconsistent, -> must be reconciled
HTTP connections
a. non-persistent HTTP
- At most one object sent over TCP connection
- Connection then closed
- Downloading multiple objects required multiple connections
b. persistent HTTP
- Multiple objects can be sent over single TCP connection between client, server.
A web page can consist of many different files
Median number of objects embedded in a webpage
Non-Persistent HTTP
The “classic” approach in HTTP/1.0 is to use one HTTP request per TCP connection, serially.
Non-persistent HTTP: response time
- RTT: time for a packet to travel from client to server and back
- HTTP response time:
- one RTT to initiate TCP connection
- one RTT for HTTP request and first few bytes of HTTP response to return
- This assumes HTTP GET piggy backed on the
- File transmission time
- Non-persistent HTTP response time =
2RTT + file transmission time
Persistent HTTP
non-persistent HTTP issues:
- requires 2 RTTs per object
- OS overhead for each TCP connection
- browsers often open parallel TCP connections to fetch referenced objects
persistent HTTP:
- server leaves connection open after sending response
- subsequent HTTP messages between same client/server sent over open connection
- client sends requests as soon as it encounters a referenced object
- as little as one RTT for all the referenced objects
Persistent HTTP 2
- The “persistent HTTP” approach can re-use the same TCP connection for multiple HTTP transfers, one after another, serially.
- Amortizes TCP overhead, but maintains TCP state longer at server
- The “pipelining” feature in HTTP/1.1 allows requests to be issued asynchronously on a persistent connection. Requests must be processed in proper order. Can do clever packaging
HTTP request message
- Two types of HTTP messages: request, response
- HTTP request message:
- ASCII (human-readable format)
HTTP request message: general format
Uploading form input
POST method:
- web page often includes form input
- input is uploaded to server in entity body
URL method:
- uses GET method
- input is uploaded in URL field of request line:
HTTP response message
HTTP response status codes
- Status code appears in 1st line in server-to-client response message.
- Some sample codes:
200 OK
¤ request succeeded, requested object later in this msg
301 Moved Permanently
¤ requested object moved, new location specified later in this msg (Location:)
400 Bad Request
¤ request msg not understood by server
404 Not Found
¤ requested document not found on this server
505 HTTP Version Not Supported
User-server state: cookies
- Cookie == a small file with data (up to 4KB)
¤ Sent to the Web browser by the Web server
¤ Saved locally inside the browser
¤ Sent back by the browser in all subsequent requests - Example:
¤ Susan always access Internet from PC
¤ visits specific e-commerce site for first time
¤ when initial HTTP requests arrives at site, site creates:- unique ID
- entry in backend database for ID
Four components of cookies
1) Cookie header line of HTTP response message
2) Cookie header line in next HTTP request message
3) Cookie file kept on user’s host, managed by user’s browser
4) Back-end database at Web site
Cookies: keeping “state” (cont.)
Cookies (continued)
- What cookies can be used for:
- Authorization
- Shopping carts
- Recommendations
- User session state (Web e-mail)
- How to keep “state”
- Protocol endpoints: maintain state at sender/receiver over multiple transactions
- Cookies: HTTP messages carry state
- Cookies and privacy:
- Cookies permits sites to learn a lot about you
- You may supply name and e-mail to sites
Cookies + Third Parties
How it works
50% of the websites are using more than 100 3rd party cookies
What can we do about it?
- Different ad block products (block cookies/connections to third party sites)
- Ghostery, Ad Block etc.
- Doesn’t completely solve the problem…
- Trackers getting smarter. Use browser features to fingerprint
- E.g., combination of installed extensions/fonts etc.
- Surprisingly unique!
Evolution of Serving Web Content
¨ In the beginning…
¤ …there was a single server
¤ Probably located in a closet
¤ And it probably served blinking text
¨ Issues with this model
¤ Site reliability
n Unplugging cable, hardware failure, natural disaster
¤ Scalability
n Flash crowds (aka Slashdotting)
Replicated Web service
- Device that multiplexes requests across a collection of servers
¤ All servers share one public IP
¤ Balancer transparently directs requests to different servers
¨ How should the balancer assign clients to servers?
¤ Random / round-robin
n When is this a good idea?
¤ Load-based
n When might this fail?
¨ Challenges
¤ Scalability (must support traffic for n hosts)
¤ State (must keep track of previous decisions)
n RESTful APIs reduce this limitation
Load balancing: Are we done?
¨ Advantages
¤ Allows scaling of hardware independent of IPs
¤ Relatively easy to maintain
¨ Disadvantages
¤ Expensive
¤ Still a single point of failure
¤ Location!
Popping up: HTTP performance
¨ For Web pages
¤ RTT matters most
¤ Where should the server go?
¨ For video
¤ Available bandwidth matters most
¤ Where should the server go?
¨ Is there one location that is best for everyone?
¨ Impact on user experience
¤ Users navigating away from pages
¤ Video startup delay
¨ Impact on revenue
¤ Amazon: increased revenue 1% for every
100ms reduction in page load time (PLT)
¤ Shopzilla:12% increase in revenue by
reducing PLT from 6 seconds to 1.2
seconds
¨ Ping from LON to NYC: ~100ms
Web caches (proxy server)
- Goal: satisfy client request without involving origin server
- User sets browser: Web accesses via cache
- Browser sends all HTTP requests to cache
¤ Object in cache: cache returns object
¤ Else cache requests object from origin server, then returns object to client
More about Web caching
- Cache acts as both client and server
¤ Server for original requesting client
¤ Client to origin server - Typically cache is installed by ISP (university, company, residential ISP)
- Why Web caching?
¤ Reduce response time for client request
¤ Reduce traffic on an institution’s access link
¤ Internet dense with caches: enables “ poor ” content providers to effectively deliver content (so too does P2P file sharing)
Caching example
Scenario:
- access link rate: 1.54 Mbps
- RTT from institutional router to server: 2 sec
- web object size: 100K bits
- average request rate from browsers to origin servers: 15/sec
- avg data rate to browsers: 1.50 Mbps
Performance:
- access link utilization = .97
- LAN utilization: .0015
- end-end delay = Internet delay + access link delay + LAN delay = 2 sec + minutes + usecs
Option 1: buy a faster access link
Option 2: install a web cache
Calculating access link utilization, end-end delay with cache
Conditional GET
- Goal: don’t send object if cache has up-to-date cached version
¤ No object transmission delay
¤ Lower link utilization - Cache: specify date of cached copy in HTTP request
¤ If-modified-since: <date></date> - Server: response contains no object if cached copy is upto-date:
¤ HTTP/1.0 304 Not Modified
HTTP/2
Key goal: decreased delay in multi-object HTTP requests
- HTTP1.1: introduced multiple, pipelined GETs over single TCP connection
- server responds in-order (FCFS: first-come-first-served scheduling) to GET requests
- with FCFS, small object may have to wait for transmission (head-ofline (HOL) blocking) behind large object(s)
- loss recovery (retransmitting lost TCP segments) stalls object transmission
- HTTP/2: [RFC 7540, 2015] increased flexibility at server in sending objects to client:
- methods, status codes, most header fields unchanged from HTTP1.1
- transmission order of requested objects based on client-specified object priority (not necessarily FCFS)
- push unrequested objects to client
- divide objects into frames, schedule frames to mitigate HOL blocking
HTTP/2: mitigating HOL blocking
HTTP/2 to HTTP/3
- HTTP/2 over single TCP connection means:
- recovery from packet loss still stalls all object transmissions
- as in HTTP 1.1, browsers have incentive to open multiple parallel TCP connections to reduce stalling, increase overall throughput
- no security over vanilla TCP connection
- HTTP/3: adds security, per object error- and congestion-control (more pipelining) over UDP
- more on HTTP/3 in transport layer
- recovery from packet loss still stalls all object transmissions