Week 1 Flashcards
What are some of the challenges in building a web search engine?
A large number of web pages
Web pages are not ordered or indexed
No classification of web pages
What are the two principles of the PageRank algorithm?
Hyperlink tricks
Random surfer model
What is the hyperlink trick?
Pages can be ranked by the number of links from which this page is linked to
What is the matrix form of pagerank?
P_n+1 = H x P_n
Where H is the transition matrix
What are the two issues in PageRank?
Sink Pages
Cycle Pages
How do we solve the sink page problem?
Replace the column of all 0s with 1/N in the P matrix
What is the cycle problem?
Few webpages may be linked to a closed cycle, leading to an infinite increase of the authority score
What is the random surfer model?
A surfer will randomly click on a link (accessing the site directly) or click a random hyperlink on the website.
What does the damping factor do?
It’s a chance that a user will stop clicking links and get bored.
What is the complexity of PageRank?
O(MN^2)
M = number of iterations