Unit 5 - Web Searching Flashcards

Question 1

Q

Data structures used for storing indices:

Answer

A

Suffix tree,
Inverted index
Citation index
N-gram index
Term document matrix

Question 2

Q

What are indices?

Answer

A

Indices are nothing but short descriptions of each webpage that may include title, creation, date and size, 1st line etc.

Question 3

Q

What is XML?

Answer

A

Stands for extensible Markup Language, used for exchanging data on the Web

Enables separation of content(XML) and presentation(XSL).

Question 4

Q

Who created XML?

Answer

A

W3C, to provide easy to use and standardised way to store self describing data.

Question 5

Q

INEX 2002 defined:

Answer

A

Component coverage and topical relevance

Question 6

Q

Four cases in Component coverage dimension

Answer

A

Exact coverage (E)
Too small (S)
Too large (L)
No coverage (N)

Question 7

Q

Cases in Topical relevance:

Answer

A

Highly relevant (3)
Fairly relevant (2)
Marginally relevant (1)
Non relevant (0)

Question 8

Q

What is a search engine?

Answer

A

Search engine is a program which helps users to find information stored on a computer somewhere in the World Wide Web.

Question 9

Q

Centralised crawler index architecture:

Answer

A

It is used by most of the search engines so it uses a crawler gather information to a single site where it is index by the index

Question 10

Q

Components of crawler indexer architecture

Answer

A

Crawlers, index query engine user interface

Question 11

Q

Problems using crawler indexer architecture

Answer

A

Dynamic nature of the web
High load on web servers
Large volume of data
Communication link problem

Question 12

Q

Harvest distributed crawler index and architecture

Answer

A

Problems:
Due to different crawler server load increase
Object by the cross are usually useless and discarded
No coordination among the crawlers

Question 13

Q

Components of harvest:

Answer

A

Gatherers
Brokers
Replicator
Object cache