Mapreduce Flashcards
What is the MapReduce Programming Model’s Data type?
Key-value records
What is the MapReduce Programming Model’s map function?
(Kin, Vin)–>list(Kinter, Vinter)
What is the MapReduce Programming Model’s reduce function?
(Kinter, list(Vinter))–>list(Kout, Vout)
Check notes for wordcount example
Check notes for wordcount example
What is Apache Hadoop MapReduce?
An open source implementation of Google’s MapReduce framework.
There are two ways to write jobs:
–Java
–Hadoop Streaming(for Python, Perl, etc)
Explain Google PageRank.
The PageRank algorithm rates linked documents (web pages)
It is the basis of the Google search engine for the ranking of web pages.
Principle: The numerical weight (PageRank) PRp of a web page p depends on the number and the numerical weight of the web pages, which link to p.
What is the PageRank algorithm?
Go to notes
Parallelisation improves computation efficiency but what are some challenges it has faced?
Managing multiple servers is difficult because servers need coordination and server failure should not affect job execution.
Achieving parallelisation is difficult because tasks must be made fully independent. It’s hardware to rationale program irregularities (debugging).