Web Architecture Flashcards
How can resources be represented?
text (plain, html, csv), image, audio, video, application
How can resources be identified?
With URIs (Uniform Resource Identifiers)
How can resources be interacted with?
Using network protocols like HTTP
What does the bow-tie model represent?
The shape of the web
Sections such as LSCC core, IN, OUT, and disconnected components
What is the web?
A distibuted information system that provides acess to resources
What is hypertext?
A way to link information in a non-linear interactive way - cannot be represented conveniently on paper
What are 2 disadvantages of hypertext?
Disorienting - easy to lose sense of direction
Cognitive overhead - additional effort to maintain several trails at once
What is hypermedia?
Non-textual media
What are nodes, links, anchors, and endpoints?
Node - a point in the network e.g. a webpage
Link - a connection between nodes
Anchor - the clickable element that links pages
Endpoint - the destination of a link
<a href="http://www.endpoint.com/">anchor</a>
What are embedded links?
Links that are encapsulated in a node, and form part of the document content
What are first class links?
Where links are separated from nodes allowing multiple link overlays/linkbases (links over same node), creating different connections without changing the node
Link bases can be tailored to reader
Has 2 endpoints
What are bidirectional links?
Links that can be traversed backwards as well as forwards
What are N-ary links?
Links involving more than 2 nodes, allowing relationships between multiple entities
What are generic links?
Links where, by using locspecs, all occurences of a word can be linked to the same endpoint
What are functional links?
Links that represent predefined relationships
What are typed links?
Links that define the nature or relationship of the link, such as “friend,” “parent,” or “employee”
What is REST?
Representational State Transfer
A web architecture style that uses stateless communication to manipulate resource representation
What does REST aim to do (2)?
Minimise latency and network communication
Maximise independence and scalability of components
What are the 4 components of REST?
Origin servers - the ultimate place you get a resource from
Gateways - for integrating legacy servers
Proxies - to filter & cache
User agents
User agent & origin server are end points that communicate using HTTP
If using gateway, origin server & gateway don’t communicate in HTTP
What are the 5 constraints of REST?
Client-server - separation of concerns (client: user interface, server: data storage)
Stateless - no context stored on server, session state kept on client
Caching - response data labelled as cacheable or non-cacheable
Uniform interface between components - identify what next possible actions could be
Layered - system components have no knowledge of components they don’t directly interact with
What are the 3 advantages of client-server constraint?
- Improves portability
- Improves scalability (as server simplified)
- Allows components to evolve separately
What are the 3 advantages and 1 disadvantage of stateless constraint?
Advantages:
* Improves reliability
* Improves scalability
* Improves visibililty (of requests)
Disadvantage:
* Increases per-action overhead
What are the 2 advantages and 1 disadvantage of caching constraint?
Advantages:
* Eliminates some actions
* Reduces latency
Disadvantage:
* Reduces reliability
What are the 2 advantages and 1 disadvantage of uniform interface constraint?
Advantages:
* Improves visibility (of interactions)
* Implementations decoupled from services they provide
Disadvantage:
* Reduces efficiency
What are the 2 advantages and 1 disadvantage of layered constraint?
Advantages:
* Limits system complexity
* Improves scalability
Disadvantage:
* Adds latency & overhead
What are the 3 principles of address identifiers?
Global - addresses should be unambiguous & human readable
Distinct identifiers - using same URI for different resources creates a URI collision
Avoid aliases - don’t use different URIs for same resource
How should documents be named?
Use logical names rather than physical addresses to avoid issues when documents are moved
URL vs URN
URL specifies location of resource on internet
URN uniquely identifies resource by name - not that good approach as can just use HTTP
What are IRIs (Internationalised Resource Identifiers)?
An extension to URIs, allowing Unicode characters
Why shouldn’t you change URIs?
Breaks pages linked to old URI -> 404
What are the 5 principles of representation?
- W3C representation principles - follow a format to future proof
- Separate content, presentation & interaction
- Identify links to other resources
- Links should be navigable
- Links should be web-wide
Data vs metadata
Data is the actual information/content
Metadata is data that describes other data e.g. file size, creation date
What are the 5 principles of interaction?
- Provide representations
- Safe retrieval
- References doesn’t imply dereference - just because you can retrieve a representation, doesn’t mean you must
- Reuse representation formats
- Representations should be consistent
What is a safe method?
One that doesn’t change the state of the resource
What is an impotency method?
One that doesn’t change the result even when applied multiple times
Only POST isn’t impotent
What do the HTTP responses (1xx - 5xx) represent?
1xx - informational message
2xx - success
3xx - redirection
4xx - client error
5xx - server error
What are the 2 styles of content negotiation?
Sever-driven - server makes final choice of representation
Client-driven - clinet makes final choice of representation
What are the 3 stages of server-driven content negotiation?
1) Client tells server what it is able to accept in request header
2) Server chooses appropriate representation to return to client based on “quality” (provided by client)
3) Server tells clients its choice in response header
Name three properties that can be negotiated
Media type
Language
Encoding
What are the 3 stages of client-side content negotiation?
1) Client requests resource representation
2) Server returns HTTP redirect status (“300 multiple choices”) with list of URIs
3) Client requests a representation of one of the URIs
What is “Client Hints”?
A HTTP extension that allows browsers to state their capabilities & preferences
How can you avoid the lost update problem?
When carrying out unsafe methods, check if the state of the resource has change since the GET method
What 2 ways are there for validating if resources are the same?
Strong validation - checks if representations are byte-for-byte identical
Weak validation - checks if representations contain “the same content”
What are ETags? What headers can be applied to them?
Entity tags are identifiers for resource versions
Headers:
* If-Match: <etag>, <etag>, ...
* If-None-Match: <etag>, <etag>, ...
What are cookies?
A way for web servers to persist state across HTTP requests (even though HTTP is supposed to be stateless)
What are “Secure” and “HttpOnly” cookies?
Secure - indicates that cookies should only ever be sent over HTTPS
HttpOnly - cookies should not be visible from within the Document.cookie interface
Discuss the physical limits on data transmission (3)
Sending a message at c (3e8) still takes 0.067s to go halfway round the world
Optical fibres are ~70% of c, coxial cables are >80% of c
Routers, switches, etc introducers delays
How does TCP delay HTTP?
HTTP runs of top of TCP
TCP establishes connections with a three-way handshake (>=0.2s)
What 2 methods reduce TCP delay for HTTP?
Keep-Alive - TCP connections reused for multiple HTTP requests
Pipelining - multiple requests made without waiting for responses
What 4 improvements were made to adhere to data transfer capacity limits?
Compressed headers to reduce amount of data sent
Prioritised requests - sends important content first
Multiplex requests - when client requests HTML document with multiple images, stylesheets & scripts, send a single connection for all resources
Server push - when a client requests HTML doc with image, instead of waiting for them to request image, pre-emptively push resource
What are tunnels in the context of proxies?
CONNECT method establishes tunnel between client & server
With tunnel established, proxy server no longer inspects/modifies data; just forwards
How is data secured in HTTPS?
Using the TLS (Transport Layer Security) protocol
What are the 4 cryptography principles?
Confidentiality - no unathorised reading
Integrity - no unauthorised modificaiton
Authenticaiton - proof of authorisation
Non-repudiation - data author can’t deny authorship
How are digital signatures created (3 steps)?
Combines asymmetric encryption & cryptographic hash
1) Generate cryptographic hash of image
2) Encrypt hash with private key
3) Attach encrypted hash to message
How are digital signatures verified (3 steps)?
1) Generate cryptographic hash of image
2) Decrypt hash with public key
3) Compare hashes
If hashes match, message has not been altered and signature is valid
What is the Certificate Authority?
A trusted organisation that issues digital certificates
How does the Diffie-Hellman Key Exchange work?
1) Prime number p and root module g are shared publically
2) A and B pick random large integers: a and b
3) A and B calculate g^amodp=PUa and g^bmodp=PUb respectively and send results publically
4) A calculates PUb^amodp and B calculates PUa^b*modp
Authentication vs authorisation
Authentication - verifying identity of user/system
Authorisation - granting access/permissions to resources
What are the 6 steps in granting authorisation in OAuth (the diagram)?
1) Client requests authorisation from the authorisation server via the resource owner
2) Resource owner authenticates the request
3) Authorisation server sends an authorisation code to client via the resource owner
4) Client sends the authorisation code to the authorisation server
5) Authorisation server sends access token to client
6) Client accesses resource on resource server
What is cross-site request forgery?
When a user (unintentionally) allows one origin to talk to a different origin
User clicks on a link/form while authenticated on a site allowing attacker to perform actions on site with user’s authentication
How is cross-site request forgery prevented?
Same origin policy
Restricts web pages from making requests to different domains than the one that hosted it
How is “same origin” determined?
- URIs use same protocol
- URIs have same host
- URIs have same port
What is the default port for HTTP?
80
What blocking exceptions are there to same origin policy?
Embedded resources (media, stylesheets, scripts, etc)
What is CORS (cross-origin resource sharing)?
Security feature that relaxes SOP, allowing certain origins to make requests to a domain different to the one that served the web page
Servers indicate which origin may make requests and restrict headers send & received
What criteria must simple requests satisfy for CORS?
- Only methods: GET, HEAD, POST
- Only headers: Accept:, Accept-Language:, Content-Type:, Content-Language
- Content-Type: text/plain, (application/…), (multipart/…)
What is a CORS preflight?
Used for more complex requests (other methods, custom headers)
A preliminary HTTP request to check if actual request is allowed by the server before sending actual request
What is SGML (Standard Generalised Markup Language)?
An old markup language (old version of HTML)
A language for defining markup languages
What is XML (eXtensible Markup Language)?
A general purpose markup language
A W3C-defined subset of SGML
A language for defining domain-specific markup languages
What is DTD (Document Type Definition)?
A formal definition of the grammar for an XML document
Tells document processor how to parse the document
What 2 schema language competitors does DTD have?
XML Schema
RELAX NG
Document well-formedness vs validity
Well-formedness - obeys syntax rules in XML spec
Validity - well-formed and structure is based on a defined schema (e.g. DTD)
What is SVG (Scalable Vector Graphics)?
XML-based language for describing 2D graphics
Uses CSS for styling & animation
Integrates with HTML5
What is MathML?
XML-based language for expressing mathematical expressions
Integrates with HTML5
2 sub-languages:
* Presentational MathML
* Semantic MathML