Week 6B (large scale data analysis using MR) Flashcards

1
Q

at the abstract level, how do we modify the calculation of pagerank to avoid spider traps

A

allow each rand surfer a small possibility of teleporting to random page. taxation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is this equation used to do

A

compute new vector estimate of pageranks vā€™ from the current pagerank estimate v and the transition matrix M

it is the taxation equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

in this equation what is B usually chosen as and what is it

A

0.8-0.9

the probability that the random surfer decides to follow an outlink from their present page

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

in this equation, what is e

A

a vector for all 1ā€™s with the approprtiate number of components

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

in this equation, what is n

A

num of nodes in web graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

in this equation, what case does the term BMv represebt

A

with probability B, the random surfer decides to follow an outlink from their present page

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

in this equation, what data structure does (1-B)e /n evaluate to

A

a vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

in this equation, what does (1-B)e /n represent

A

the introduction with probability of 1-B of a new random surfer at a random page

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

in a graph what is a clique

A

set of nodes with all possible arcs from one to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

is there a dead end or spider trap

A

dead end: E

spider trap: No, having a clique does not mean there is spider trap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

we do recursive deletion, what would be the pagerank assigned to each of the nodes

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

2 basic steps for search engine when generating their final results

A

select candidate pages from query

among qualified pages, Page rank score computed for each

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how do we reduce the amount of data that must be passed from the Map tasks to Reduce tasks during pagerank calulation

A

M is a very sparse matrix, represent it by only its nonzero elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is a spam farm

A

collection of pages whose purposes is to increase pagerank of certain page or pages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what does the graph of a spam farm look like

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
A