Option C — Web Science Flashcards
WWW
The system used for accessing web pages and websites
Web 1.0
A library where you can look for information, but cannot change anything
Web 2.0
- more interactive
- more multimedia
- more social
- creating a web page where the user is able to change the information
Web 3.0
All the data on the web is interconnected like a super database
Software agents
Programs that crawl through the Web, searching for relevant information
Ontology
- a file that defines the relationships among a group of terms.
- concept of linking context to content
Semantic Web
Proposes to help computers “read” and use the web
Metadata
Information included in the code of the web that are invisible to humans, but are readable to the computers
Internet
The entire network of connected computers and routers used for sending data
HTTP
- hyper text transfer protocol
- client requests a HTTP request message
- server returns a response message
- the response contains completion status info. about request
HTTPS
- hypertext transfer protocol secure
- a communications protocol for secure communication over a computer network
- the result of layering the HTTP on top of the SSL/TLS protocol
- provides the authentication of the website
HTML
- hypertext markup language
- uses tags to determine how the webpage will be displayed in the web browser
XML
- extensive markup language
- markup language that defines a set of rules for encoding document in a format that is both human and machine readable
- it can create any set of tags
XSLT
- extensible style sheet language transformations
- a language that transforms XML documents into other formats
JavaScript
an object orientated computer programming language used to create interactive effects within web browsers
CSS
- cascading style sheet
- a style sheet language used for describing the presentation of a document in a markup language
- designed primarily to separate document context with document presentation
CSS Advantages
- improve content accessibility
- provide more flexibility and control in the specification of the characteristics
- enables multiple HTTP pages to have the same CSS/style
URI
- uniform resources identifier
- the method in which you identify some points of content (on WWW)
- the most common form of URI is URL
URL
- uniform resource locator
- a specific character string that constitutes a reference to a Internet resource
How does a DNS work
It turns a user friendly domain name into an IP address that allows the computer identify each other on he network
IP
A numerical a label assigned to each computer on the network
TCP
Connection is established and maintained until the two hosts have finished exchanging messages
FTP
Most common protocol that is used to transfer files between two locations
Meta tags
Snippets of text that describes a page’s content that’s only visible in the text’s code
Protocol
Enables the compatibility through a common language internationally
Standard
An agreed way of doing/measuring something
Static web page
A web page delivered to the user exactly as it is stored
Advantages and Disadvantages of Static Web pages
Adv:
- cheap to host
- quick and cheap to develop
Disadv:
- requires web development skills to update
- not useful for user
- content can get stagnant
Dynamic web page
A web page that displays trifle rent content each time you access it
Scripts + what are they used for
A set of instructions used mainly in a dynamic webpage to find your query results, placing an ad, display a list of products etc
Client side scripts
- interpreted by the browser
- used to make the web page change AFTER it has arrived to the browser
- relys on user’s computer
Client side script process
- client requests for a web page to the server
- server returns the web page
- the page is displayed while the script is running after/during display
Server side scripts
- the script is customized to the user/user’s occasion
- allows a level of privacy
- script is interpreted by the server and more scripts = more workload on server
- script always works in the same way
Server side script process
- client requests a web page to server
- the script in the page is interpreted by the server which creates/changes the page content to match the user’s customization
- the final form of the page is sent and the content CANNOT be changed with server side scripting
CGI + describe the process of the CGI
- common gateway interface
- the method of passing data back and forth between the server and application
Search engine
Software that allows the user to search for information on the WWW with specific key terms
PageRank
- da h page is given a score for a certain search
- the most important pages have the most important inlinks
- uses the probability of landing on a page after clicking on a specific number of links
HITS Algorithm
- link analysis algorithm that ranks web pages
- uses hubs and authorities (define them)
- a repetitive process that is executed at query time = slow
How does the HITS algorithm work?
Refer to notebook
Web Crawler + how does it function?
- computer programs that scan the web, ‘reading’ everything they find that is relevant to the search
- those key terms are then indexed
- reeder to notebook
What is the relationship between data in a meta tag?
The relationship is NOT always transitive.
Define transitive
Parallel Web Crawler + Goal and how does it achieve its goal
- A crawler that runs multiple processes simultaneously
- maximize download rate while also minimizing overheads and avoid repeated downloads
- system requires a policy that assigns new URLs discovered
Index + its Purpose
- where all the key terms are found and stored by the web crawler
- to optimize speed and performance in finding relevant documents for a search query
Black hat techniques
- hidden texts
- scraping
- keyword stuffing
- blog spam
- link farms
- paid links
- doorway pages
- parasite hosting
- cloaking
White hat techniques
- guest blogging
- link baiting
- quality content
- internal linking
Gray hat techniques
- 3 way link exchange
- buying old/expired domains
- article spinning
- Google bombing