Web Flashcards
Node
Chunk of information corresponding to a semantic unit e.g. a page
Link
Association between nodes that may be navigable
Endpoint
Component of a link that points to anchor
Anchor
Representation of a link on a node
Embedded/Web Links
Embedded in source code.
One-way.
Connect only a pair of nodes (binary).
Explicitly defined nodes.
Usually no additional information regarding link relationship
First Class Links
Links separate from nodes in link-bases.
Multiple overlays (Personalisation).
Can be bidirectional.
Can be N-ary.
Can be generic (based on value).
Can be functional (destination is a function of the source)
Typed Links
Contain additional information about relationship.
Rel (Relationship),
Rev (Reverse Relationship).
Rarely used
HTTP URIs
http: //〈host〉〈:port〉〈/path〉?〈query〉#〈fragment〉
HTTP Requests
GET / HTTP/1.1
HOST: example.org
HTTP 1.0
Separate TCP connection for every request so 3-way TCP Handshake (SYN, SYN-ACK, ACK) for each TCP open and a 4-way handshake for each TCP close (Latency issues).
HTTP 1.1
Keep-alive introduced. Same TCP connection used for multiple HTTP requests (One TCP open/close).
Pipelining introduced allowing multiple requests without waiting for response (Responses sent in same order of requests).
Compressible Body
HTTP 2
Multiplexed Requests (Asynchronous requests, each with their own stream, with responses returned in any order)
Stream Prioritisation (Data in high priority streams sent before low priority).
Compressible Headers.
Server Push (Server pre-emptively sends resources to a client in response to a single request)
HTTP 3
Replaces TCP with QUIC on UDP addressing HTTP 2 issue where multiplexes are not visible to TCP loss recovery mechanism, causing stalls when packets are lost.
UDP is unreliable but QUIC has error correction with retransmission.
Why was HTML5 developed?
Developed in response to
* The increased use of the web for web applications using client-side JavaScript (XHR, AJAX etc)
* Concerns about Adobe Flash in relation to openness, reliability, security, and performance
* Concerns about the inconsistency of handling invalid markup across browsers
* Overuse of semantic-light markup (div, span) which go against the semantic intent of HTML
HTML5 Design Principles
Compatibility (Support old content).
Utility (Separate content and presentation).
Interoperability (Well-defined behaviour with graceful error handling).
Universal Access (Work on all platforms)
XSL
Extensible Stylesheet Language. Family of XML technologies to define how XML data should be presented.
XSLT
Extensible Stylesheet Language Transformations
Transform one XML language into another e.g. HTML
XPath
Navigate XML documents
E Selects all nodes with name E
/ Selects from the root node
// Selects nodes anywhere under current node
. Selects current node
.. Selects parent of current node
@ Selects attributes
CSS vs XSL
CSS is simple and cascading (consideration of needs) but cannot modify document structure.
XSL is able to transform document structure but is complex and cumbersome with no consideration for needs of users, authors, implementers (c.f. cascading)
Internet Engineering Task Force (IETF)
Practical standards organisation
IESG
Internet Engineering Steering Group.
Manage and oversee standards process, Approves RFCs
IESG Areas
Categories of interest overseen by an area director
IETF Working Groups
Collaborative groups to develop and standardise practices
IAB
Internet Architecture Board
Provide architectural advice and oversight ensuring all proposals from IESG areas function together
IRTF
Internet Research Task Force.
Research for topics that may need standards in the future
IETF Internet Drafts
Preliminary technical specifications valid for only 6 months
IETF Requests for Comments (RFC)
Documents produced by working groups that describe specifications, guidelines and best practices.
STD describe standards.
BCP describe Best Current Practice with policies and procedures.
W3C
Corporate membership standards organisation (companies, universities, etc.)
W3C Roles
Director.
Team (Support workings of W3C).
Advisory Committee (AC) (Contains a representative from each member organisation. Reviews director’s proposals).
Advisory Board (AB) (Guide W3C in non-technical matters i.e. policy).
Technical Architecture Group (TAG) (Oversight ensuring all standards work together, like IAB)
W3C Working Group
Chartered for a specific duration to deliver a standard. May invite non-member experts
W3C Interest Group
Chartered discussion form
W3C Community Group
Discussion forum open to non-members
First Public Working Draft
Draft made public signifying beginning of work on a specification
Working Draft
Document actively being worked on by a Working Group
Last Call Working Draft
Document that the Working Group believes is ready to be publish, open for review by others
Candidate Recommendation
Demonstrates the standard has multiple independent implementations
Proposed Recommendation
Awaiting director approval
Recommendation
Formally approved, has become a W3C recommendation
Member Notes
From a member of a working group detailing a technology they want to be considered as part of a Working Group deliberation
Working Group Notes
Document decisions made
Cross-Site Request Forgery (CSRF)
Web security vulnerability. Malicious origin tricks browser into making unauthorised requests to trusted sites and carrying out actions on the users behalf without consent
Same-Origin Policy (SOP)
Restricts how resources from different origins interact. Blocks all cross-origin reads except embedded resources (Scripts cant make requests)
Cross-Origin Resource Sharing (CORS)
Selectively relaxes the Same-Origin Policy
CORS Response Headers
Specify allowed origins, methods and headers
Cross-Site Scripting (XSS)
User input allows malicious code (XSS Vector) injection.
Non-persistent has malicious code in parameters of a GET request which gets executed.
Persistent has malicious code in the state of a resource that is displayed to all users
Content Security Policy
Describes what a resource is allowed to access with
Fetch Directives (controls the places from which resources can be loaded).
Document Directives (Controls properties of a document).
Navigation Directives (Controls where a user can navigate to).
Reporting Directives (Controls where violations are reporting to).
Marketing Funnel
Narrows down a wide audience to those who actually purchase something
Awareness
Interest
Decision
Action
Advertising Types
Display (Banners, images, etc.).
Contextual (Displayed in content relevant to the product)
Search.
Behavioural (Based on user profiling)
Advertisement Billing
CPM (Cost per Mile) (Pay per impressions).
CPC (Cost per Click).
CPA (Cost per Action) (Pay be actions linked to ads e.g. sign up to a mailing list, or purchase something)
Google AdWords
Auction to show advertisementsin search. Calculates Ad Rank for how likely a user is to click a result
Google AdSense
Contextual advertising based on keyword matching in the content. Auctions for ad positions
Fingerprinting
Stateless tracking technique using an identifier derived from device information (timezone, language, OS, etc.)
Real-time Bidding (RTB):
Mimics stock exchange for buying and selling ads in real time.
Open Hypermedia
An information system consisting of nodes connected with associative links with an interface by which third party programs can access the functionality of the system
Link Service
Takes a document and returns hyperlinks that can be applied
Dexter Hypertext Reference Model
A model of open hypertext systems, concerned with the database of nodes and links
Dexter Storage Layer
Consists of three Components: Atoms (Documents), First-class Links , Composites (Sequences of components)
Dexter Specifiers
Consist of component reference, anchor reference and direction
Referential Integrity
Ability to follow any link
Hyper-G Core Links
Relate documents stored on the same server
Hyper-G Surface Links
Relate documents stored on different servers
P-flood Algorithm
Allows surface link updates to be propagated across servers (alternative to expensive broadcasting). Arrange servers in a ring and send link updates to successor with random chance of sending to another location in ring.
Microcosm
Hypermedia for read-only media
Microcosm Filters
Take a document and add / remove links. Filters organised into chains. Filters can be published for other Microcosm instances to use
Web Hypermedia Link Injection
Batch Processing.
Origin Server (Apply with algorithm on demand).
Proxy (User configures browser to go via link injector proxy).
User Agent (Browser injection)
Spatial Hypermedia
Ambiguous and partial relationships expressed through space i.e. close objects have a close relationship
Sculptural Hypermedia
Assumes all nodes are linked to each other and removes links until desired structure is reached
Temporal Hypermedia
Links continuous media (sound, video) to other contexts with annotations and synchronisation
Conceptual Hypermedia
Structure and links derived from relationships between objects in the real world
Pervasive Hypermedia
Integration of hypermedia into every day life as technology is integrated
Cacheable Assets
Images, systlesheets, fonts, media
Carefully Cacheable Assets
Data, HTML, Frequently modified JS and CSS
Never Cacheable Assets
Sensitive data, frequently changing user specific data
Proxy Cache
Located close to clients to decrease latency and bandwidth usage
Reverse Proxy Cache
Located close to origin server intended to decrease load on a web service
Forward Proxy
Located close to client
For content filtering and content translation (injection of adverts, compress images)
Open Proxy
Located anywhere between client and server
For anonymity
Reverse Proxy
Located near server
Load balancing (distribute web requests across servers).
Content Switching (each server stores different content, direct to appropriate server).
Protocol Translation.
Monitoring & Filtering (credential checks, rate limits)
CDN Server Deployment Strategies
Deep in Network (Improved performance, harder management and maintenance).
Clusters near internet exchanges (Poorer performance, easer management and maintenance)
DASH
Dynamic Adaptive Streaming over HTTP: divides video files into chunks stored at multiple bit rates and a manifest file provides URLs to each chunk.
Client periodically measures server-to-client bandwidth and consults the manifest to find where to request chunks from with the maximum sustainable coding rate given the current bandwidth
Specific Query
User know what and where to look e.g. Database
Broad Query
User knows where to look but not what to look for e.g. Searching for the manager of HR
Vague Query
User doesn’t know where or what to look for e.g. Search engines
Web Crawler Selection Policy
States which pages to index using a search strategy
Web Crawler Re-visit Policy
Describes how often and what pages to re-visit for the most up to date pages
Web Crawler Politeness Policy
Describes strategies that consider server load i.e. parallel calls policy, frequency of requests.
Web Crawler Parallelisation Policy
Web crawlers may stumble across same page via two routes, describes how to handle it.
Dynamically assign pages to crawler processes or map to crawler processes using a hash function
Challenges of Search Engine Indexing
Distributed data.
Changing Data.
Large Volume of Data.
Data Issues (Redundancy, unstructured data, quality of data, heterogeneous formats)
Challenges of Searching
User has poor understanding of proper word sequencing.
Unexpected answers.
Boolean logic is not well understood.
Results beyond first page ignored
Legitimate SEO
Good design,
valid metadata,
alt tags on images
Illegitimate SEO
Deception,
meta tag abuse,
heavy repetition,
invisible text,
domain spam (duplicate sites)
RDF
Resource Description Framework.
Subject, Predicate, Object
Gold Open Access
Gold standard. Articles are freely, permanently and immediately accessible while copyright is retained by authors. Authors may have to pay for publishing
Green Open Access
Viable option. Authors can disseminate an OA copy of their work (pre or post publication copy).
Net Neutrality
ISPs must treat internet communications equally and not discriminate based on user, content, website, platform, application, source/destination address or method of communication
Arguments for Net Neutrality
Accessibility (Web access is a human right).
Equal access for start ups and big companies.
Free Speech.
Unbiased.
Free Choice (Promotes competition).
Innovation
Arguments against Net Neutrality
Provides funds for infrastructure.
Regulation (stop illegal content).
Increase quality of information.
Anti-Competitive (Limits freedom of how ISPs operate. Could stifle competition)
Copyright
Protects skill and labour expended by someone creating something new
Primary Copyright Infringement
Anyone who does any of the things that the owner of copyright has exclusive rights to do (Civil matter)
Secondary Copyright Infringement
Someone facilitates another person or group to infringe on copyright. Can be a criminal offense.
Database Rights
Protects skill and labour to create collections of data e.g. Now that’s what I call music
Trademarks
Recognisable signs which make products identifiable.
Considers impression of two similar marks,
similarity of goods/services,
evidence of confusion,
likelihood of expansion of product lines
Cyber Squatting
Registering a domain name with no legitimate reason other than to benefit from others good reputation. Redirects users to unrelated material or sell domain at inflated price
Creative Commons
Copy, Modify, Redistribute, Show. Must attribute original work
Creative Commons ShareAlike
Copy, Modify, Redistribute, Show. Must attribute original work and share under same license
Creative Commons NoDerivs
Copy, Redistribute, Show. Must attribute original work and cannot transform or build upon
Creative Commons NonCommercial
Copy, mody, redistribute,s how. Must attribute original work and cannot use for commercial purposes. Can be combined with ShareAlike and NoDerivs
Nelson Definition of Hypertext
Knowledge that cannot be conveniently represented on paper and instead is a way of organising knowledge through relationships
Otlet
Established the RBU where bibliographic information can be accessed by author or by subject.
Defined UDC for library organisation.
Popularised microfiche, the scaling down of documents.
Ostwald
Established ‘The Bridge’. Reduce literature down to small units of information that can be arranged and linked with other units
Bush’s Memex
A personal library of information with trails, a path through information. A sequence of linked pages. Branches may exist. Comparable to encyclopaedias made with from trails running through them
Project Xanadu
Comparison of documents in parallel
Transclusion: Embed content in other documents (instead of copy and pasting)
Transcopyright: Allows integration and sharing of content while maintaing intelletual property rights
Versioning: See revisions and variants
Links that don’t break:
- Can be embedded or first class
- Can be n-ary
Halasz Seven Issues of Hypertext Systems:
- Search and Query: Link navigation may not be the best way to find things, consider searching
- Composites: Documents consisting of bits of other documents
- Virtual Structures: Documents may not exist until navigated to, they are constructed from the query
- Computation: APIs to allow executions when certain events occur
- Versioning
- Support for Collaborative Work
- Extensibility & Tailorability: Change the system to preferences e.g. instead of links, a graphical views.
TLS
Transport Layer Security.
Secure communication using asymmetric key encryption to agree on a shared symmetric key
Digital Signatures
Allow for authentication, non-repudation and integrity of messages
Digital Certificates
Sign public keys to ensure they are authentic