Week 6B (large scale data analysis using MR) Flashcards
at the abstract level, how do we modify the calculation of pagerank to avoid spider traps
allow each rand surfer a small possibility of teleporting to random page. taxation
what is this equation used to do
compute new vector estimate of pageranks vā from the current pagerank estimate v and the transition matrix M
it is the taxation equation
in this equation what is B usually chosen as and what is it
0.8-0.9
the probability that the random surfer decides to follow an outlink from their present page
in this equation, what is e
a vector for all 1ās with the approprtiate number of components
in this equation, what is n
num of nodes in web graph
in this equation, what case does the term BMv represebt
with probability B, the random surfer decides to follow an outlink from their present page
in this equation, what data structure does (1-B)e /n evaluate to
a vector
in this equation, what does (1-B)e /n represent
the introduction with probability of 1-B of a new random surfer at a random page
in a graph what is a clique
set of nodes with all possible arcs from one to another
is there a dead end or spider trap
dead end: E
spider trap: No, having a clique does not mean there is spider trap
we do recursive deletion, what would be the pagerank assigned to each of the nodes
2 basic steps for search engine when generating their final results
select candidate pages from query
among qualified pages, Page rank score computed for each
how do we reduce the amount of data that must be passed from the Map tasks to Reduce tasks during pagerank calulation
M is a very sparse matrix, represent it by only its nonzero elements
what is a spam farm
collection of pages whose purposes is to increase pagerank of certain page or pages
what does the graph of a spam farm look like