C.2 Searching the Web Flashcards

Question 1

Q

C.2.2

Distinguish between the surface web and the deep web

Answer

A

Surface web:
* Pages that are reachable (and indexed) by a search engine
* Pages that can be reached through links from other sites in the surface web
* Pages that do not require special access configurations

Deep web:
* Pages not reachable by search engines
* Substantially larger than the surface web
* (for example, parts of websites that need authentication access, private social media, emails. Or content which is blocked by paywalls, newspapers, netflix)

Question 2

Q

C.2.3

Outline the principles of searching algorithms used by search engines

Answer

A

The time a page has existed
The time a page takes to load
Dwell time (how long does the user stays on the website)
The frequency of search keywords on the page

Question 3

Q

C.2.3

What is the Page Rank Algorithm?

Answer

A

PageRank works by counting the number and quality of backlinks to a page to determine a rough estimate of how important the website is. A page with more backlinks is considered more important.

Question 4

Q

C.2.3

What is the HITS Algorithm?

Answer

A

HITS algorithm splits sites into hubs and authorities.

Authorities have a lot of inlinks. It contains valuable information that the user wants. An authority is considered good if it is linked by a lot of high quality hubs.

Hubs contain outlinks to authorities. A hub is considered good if it links to a lot of high quality authorities.

Question 5

Q

C.2.4

Describe how a web crawler functions

Answer

A

A web crawler crawls through the web and downloads and indexes webpages from all over the internet. For each page it indexes, it extracts all the links in the webpage and adds it to the list of webpages to crawl.

The goal of such a bot is to learn what (almost) every webpage on the web is about, so that the information can be retrieved when it’s needed.

Question 6

Q

C.2.5

Discuss the relationship between data in a meta tag and how it is accessed by a web-crawler

Answer

A

Meta tags are tags that are only meant for computers to read. They tell computers what the website is about.
The description meta-tag provides the indexer with a short description of the page.
The keywords meta-tag provides…well keywords about your page.

Question 7

Q

C.2.6

Discuss the use of parallel web-crawling

Answer

A

The web is growing at an astonishing pace. As such, it is necessary to parallelise the crawling process to speed it up.

Advantages
* Faster
* Network load dispersion: as the web is geographically dispersed, dispersing crawlers disperses the network load

Disadvantages
* Web crawlers may overlap and index the same page more than once
* Parallel web crawlers need to communicate with each other to effectively crawl the web. This takes up communication bandwidth

Question 8

Q

C.2.7

Outline the purpose of web-indexing in search engines

Answer

A

Indexing websites allow search engines to quickly locate relevant information for users. Information is stored about the indexed websites, like its ranking, relevant keywords and metadata. This helps search engines rank websites and give helpful information based on search queries.

Question 9

Q

C.2.8-9

Suggest how developers can create pages that appear more prominently in search engine results. Describe the different metrics used by search engines.

Answer

A

How many websites link to this website.
The clickthrough rate (how likely a user is to click on your website)
The bounce rate (how likely a user is to immediately leave your site after clicking)
Dwell time (how long a user stays on your webpage)
Using more semantic tags in your HTML which tell the bot what your website is about (article tags, section tags, h1 tag, h2 tag, footer tag)

Question 10

Q

C.2.11

Discuss the use of white hat search engine optimisation

Answer

A

Guest blogging: Writing a blog post in someone else’s blog. At the end of the blog post you can insert a link to your site, thereby increasing the number of incoming links to your site.
Quality content: Writing quality content encourages users to stay longer, increasing dwell time.
Link Baiting: Getting users to click on their link, increasing click through rate.

Question 11

Q

C.2.11

Discuss the use of black hat search engine optimisation

Answer

A

Keyword stuffing
Link farming: Creating groups of websites with hyperlinks that all link to your own.
Blog comment spamming: Automated posting of hyperlinks for promotion on any kind of publicly accessible online discussion board

Question 12

Q

C.2.12

Outline future challenges to search engines as the web continues to grow

Answer

A

As the web grows, it becomes harder to filter out the most relevant information, and paid results (ads) play an important role.

C.2 Searching the Web Flashcards

(12 cards)