Week 6 (large scale data analysis using mapreduce Flashcards
what are these examples of?
distance measures
what is the goal behind link analysis
understanding large problem w/ unstructured data
how did early search engines work
crawled the web and listed the terms in an inverted index
what is an inverted index in the the context of storage of terms
data structure which makes it easy, given a term, to find pointer to all places where that term appears
create an inverted index for the given texts
a term search for the terms Colorado State University, would give what set
what is term spam
goals behind a page rank algorithim
provide effective sumnmaries for search results
ordering/ranking results
why does a pagerank algorithim simulate random surfers
pages that would have a large number of surfers were considered more “important” than pages that would be barely visited
how is a page judged in a page rank algorithim (basic)
terms occuring on the page
and terms used in or near links to that page
what is the highest level definition of pagerank
function which assigns a real number to each page on the web
what does a higher pagerank of a page mean
the more important it is
given this graph, suppose a random surfer starts at page A, what are the probs that a surfer will be on each page in the next step
now suppose surfer at B, whats the prob of next step
for pagerank, what is the dimension of the transition matrix M
n columns and n rows