1.3.4.3 Search Engine Indexing & PageRank algorithm Flashcards
What are the programs that scour the WWW called
Spiders/crawlers
What do spiders do?
Spiders index any pages, content, metadata they find and map links between pages by following all:
Internal links
External links
This updates the index. Updating the index must be done continuously as pages get added/removed/updated
What does the PageRank algorithm do and aim to achieve?
It is used to help compile and rank webpages and list of results returned by a search engine.
It checks the number and quality of links to a page, determining roughly how important it is. Those of greater importance are more likely to be linked to from other websites.
PR(A) = (1 – d) + d (PR(T1)/C(T1) + … + PR(Tn) / C(Tn))
What does PR(A) mean?
Page rank of page A
PR(A) = (1 – d) + d (PR(T1)/C(T1) + … + PR(Tn) / C(Tn))
What does C(Tn) mean?
The total count of outbound links from web page n, including to A
PR(A) = (1 – d) + d (PR(T1)/C(T1) + … + PR(Tn) / C(Tn))
What does PR(T1)/C(T1) do?
The share of the vote page A gets from pages T1 to Tn
PR(A) = (1 – d) + d (PR(T1)/C(T1) + … + PR(Tn) / C(Tn))
d stands for a damping factor. What is meant by a damping factor?
The damping factor prevents PR(Tn)/C(Tn) from having too much influence. It notionally set to 0.85.