PageRank Algorithm¨ Flashcards
What is the PageRank algorithm used for? How is it performed?
It’s used to determine the importance of a webpage. It uses a random surfer model where a visitor start their session at random pages and walk along links that all have the same chance and the choices are made uniformly. This can be modeled by a Markov process with iterative calculation of a transition matrix. The solution is then the principal eigen vector of the transition matrix.
How much harddisk is needed in pagerank?
About 10xRAM
What is teleporting?
The method of letting the random surfer have a small probability of teleporting to a random page to prevent spider traps for example.
What are spider traps? How can they be prevented?
Spider traps are dead ends where the only transition probability is itself so the solution becomes for example [0,0,1,0]. Spider traps are sets of nodes which pools the probabilities to themselves. Can be prevented via teleporting.
What are dead-ends? How can they be prevented?
Nodes without a chance to transition away from them. So the solution becomes for example [0,0,0,0]. Can be prevented by removing the dead end recursively.
How much memory does a node need in pagerank?
32 bits/4 byte
How can the sparse matrix of pagerank be encoded to ensure less memory?
Only encode the nonzero entries of the matrix.