Web & HTTP Flashcards
Web and HTTP
- web pages consist of objects
- An object can be HTML file, JPEG image, Java applet, audio file, …
- web page consists of base HTML-file which includes several referenced objects
- Each object is addressable by a URL, e.g.,
HTTP overview
HyperText Transfer Protocol
- Web’s application layer protocol
- client/server model
¤ client: browser that requests, receives, (using HTTP protocol) and “displays” Web objects
¤ server: Web server sends (using HTTP protocol) objects in response to requests
HTTP overview (continued)
(Underlying protocol, HTTP state, Protocol state)
- Underlying protocol: TCP
- Client initiates TCP connection (creates socket) to server, port 80
- Server accepts TCP connection from client
- HTTP messages (application-layer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server)
- TCP connection closed
- HTTP is “stateless”
- Server maintains no information about past client requests
- Protocols that maintain “state” are complex!
- Past history (state) must be maintained
- If server/client crashes, their views of “state” may be inconsistent, -> must be reconciled
HTTP connections
a. non-persistent HTTP
- At most one object sent over TCP connection
- Connection then closed
- Downloading multiple objects required multiple connections
b. persistent HTTP
- Multiple objects can be sent over single TCP connection between client, server.
A web page can consist of many different files
Median number of objects embedded in a webpage
Non-Persistent HTTP
The “classic” approach in HTTP/1.0 is to use one HTTP request per TCP connection, serially.
Non-persistent HTTP: response time
- RTT: time for a packet to travel from client to server and back
- HTTP response time:
- one RTT to initiate TCP connection
- one RTT for HTTP request and first few bytes of HTTP response to return
- This assumes HTTP GET piggy backed on the
- File transmission time
- Non-persistent HTTP response time =
2RTT + file transmission time
Persistent HTTP
non-persistent HTTP issues:
- requires 2 RTTs per object
- OS overhead for each TCP connection
- browsers often open parallel TCP connections to fetch referenced objects
persistent HTTP:
- server leaves connection open after sending response
- subsequent HTTP messages between same client/server sent over open connection
- client sends requests as soon as it encounters a referenced object
- as little as one RTT for all the referenced objects
Persistent HTTP 2
- The “persistent HTTP” approach can re-use the same TCP connection for multiple HTTP transfers, one after another, serially.
- Amortizes TCP overhead, but maintains TCP state longer at server
- The “pipelining” feature in HTTP/1.1 allows requests to be issued asynchronously on a persistent connection. Requests must be processed in proper order. Can do clever packaging
HTTP request message
- Two types of HTTP messages: request, response
- HTTP request message:
- ASCII (human-readable format)
HTTP request message: general format
Uploading form input
POST method:
- web page often includes form input
- input is uploaded to server in entity body
URL method:
- uses GET method
- input is uploaded in URL field of request line:
HTTP response message
HTTP response status codes
- Status code appears in 1st line in server-to-client response message.
- Some sample codes:
200 OK
¤ request succeeded, requested object later in this msg
301 Moved Permanently
¤ requested object moved, new location specified later in this msg (Location:)
400 Bad Request
¤ request msg not understood by server
404 Not Found
¤ requested document not found on this server
505 HTTP Version Not Supported
User-server state: cookies
- Cookie == a small file with data (up to 4KB)
¤ Sent to the Web browser by the Web server
¤ Saved locally inside the browser
¤ Sent back by the browser in all subsequent requests - Example:
¤ Susan always access Internet from PC
¤ visits specific e-commerce site for first time
¤ when initial HTTP requests arrives at site, site creates:- unique ID
- entry in backend database for ID
Four components of cookies
1) Cookie header line of HTTP response message
2) Cookie header line in next HTTP request message
3) Cookie file kept on user’s host, managed by user’s browser
4) Back-end database at Web site
Cookies: keeping “state” (cont.)
Cookies (continued)
- What cookies can be used for:
- Authorization
- Shopping carts
- Recommendations
- User session state (Web e-mail)
- How to keep “state”
- Protocol endpoints: maintain state at sender/receiver over multiple transactions
- Cookies: HTTP messages carry state
- Cookies and privacy:
- Cookies permits sites to learn a lot about you
- You may supply name and e-mail to sites
Cookies + Third Parties
How it works
50% of the websites are using more than 100 3rd party cookies
What can we do about it?
- Different ad block products (block cookies/connections to third party sites)
- Ghostery, Ad Block etc.
- Doesn’t completely solve the problem…
- Trackers getting smarter. Use browser features to fingerprint
- E.g., combination of installed extensions/fonts etc.
- Surprisingly unique!
Evolution of Serving Web Content
¨ In the beginning…
¤ …there was a single server
¤ Probably located in a closet
¤ And it probably served blinking text
¨ Issues with this model
¤ Site reliability
n Unplugging cable, hardware failure, natural disaster
¤ Scalability
n Flash crowds (aka Slashdotting)