Week 6 (large scale data analysis using mapreduce Flashcards

1
Q

what are these examples of?

A

distance measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the goal behind link analysis

A

understanding large problem w/ unstructured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how did early search engines work

A

crawled the web and listed the terms in an inverted index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is an inverted index in the the context of storage of terms

A

data structure which makes it easy, given a term, to find pointer to all places where that term appears

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

create an inverted index for the given texts

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

a term search for the terms Colorado State University, would give what set

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is term spam

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

goals behind a page rank algorithim

A

provide effective sumnmaries for search results

ordering/ranking results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

why does a pagerank algorithim simulate random surfers

A

pages that would have a large number of surfers were considered more “important” than pages that would be barely visited

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how is a page judged in a page rank algorithim (basic)

A

terms occuring on the page

and terms used in or near links to that page

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the highest level definition of pagerank

A

function which assigns a real number to each page on the web

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what does a higher pagerank of a page mean

A

the more important it is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

given this graph, suppose a random surfer starts at page A, what are the probs that a surfer will be on each page in the next step

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

now suppose surfer at B, whats the prob of next step

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

for pagerank, what is the dimension of the transition matrix M

A

n columns and n rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what does the transition matrix of this graph look like

A
17
Q

what does the first and second column of this transition matrix mean

A
18
Q

what does this matrix mean/represent?

A
19
Q

for the transition matrix M, if we surf at any of the n page of web with equal prob. What will the inital vector look like and where is it located

A

1/n for each component

farthest left, first col vector (i think)

20
Q

what does the matrix mean and what does the vector mean?

and what does this equal

A

M = prob for next step from current location

Vec = probability for being in the current location

== distribution of the surfer after this step, probability for being in the next possible location

21
Q

what is v_j and m_ij ?

A

m_ij is prob that a surfer at node J will move to node i at the next step

v_j is ther prob that the surfer was at node j at prev step

22
Q

what is x_i

A

prob that a random surfer will be at node i at the next step

23
Q

what 2 conditions

A

graph is strongly connected

there are no dead ends (nodes that have no arcs out)

24
Q

how many iters to have v_j converge

A

50-75 iters

25
Q
A
26
Q

why do we need to do matrix vector multiplication in mapreduce with a sufficiently large dataset

A

vector v cannot fit in main memory

more than 1.4 billion pages

27
Q

in matrix-vector multiplication, what will be stored in the files

A

matrix M and vector V stored seperatly

28
Q

in matrix-vector multiplication, what does the map function do

A
29
Q

in matrix-vector multiplication, what does the reduce function look like

A
30
Q

in matrix-vector multiplication, what do you do if the vector v cant fit in main memory

A
31
Q

in matrix-vector multiplication, what does stripe-ing look like

A
32
Q

what is a dead end and why is it a problem

A

a page with no links out, surfers reaching page will dissapear

33
Q

what is a spider trap

A

groups of pages that all have outlinks but never link to any other pages

34
Q

what would happen if dead ends were allowed

A
35
Q

2 solutions for dealing with dead ends

A
36
Q

in recursive deletion, on the pages that are deleted, whats their page rank

A

the parent, or if the parent is deleted then its the parent of that

it just goes down the tree

37
Q

with recursive deletion, the sums will eventually exceed 1. how does this change the definition of our matrix

A
38
Q
A