4 - MapReduce Flashcards
MapReduce is what kind of model? Developed by who?
Simple data-parallel programming model for scalability and fault tolerance
How is MapReduce used by Google?
Index construction for Google Search
Article clustering for Google News
Statistical machine translation
Functional Programming
Computation as application of functions
How is functional programming different?
Traditional notions of data and instruction not applicable
Data flows are implicit in program
Different orders of execution possible
Lisp
List Processing
Lists are primitives and functions written in prefix notation
Concept: Map
Do something to everything on a list to make a new list from the results
Concept: Fold
Combine or accumulate results in a list
How does “fold” work?
Accumulator set to init value
Func applied to list elem and the accumulator
Result stored in the accumulator
Repeated for every item in the list
Result is the final value in accumulator
Map example
(map (lambda(x) (*x x))
‘(1 2 3 4 5 ) -> ???
‘(1 4 9 16 25)
Fold example
Ignore this card for now
When can we reorder folding?
If the fold function is cumulative and associative (order is irrelevant)
Is there a limit to map parallelisation?
No since maps are independent
Apache Hadoop Architecture: Master
communicates with workers, tracks resources and orchestrates work
Apache Hadoop: Worker
Launches and tracks processes spawned on worker host
For a job to be parallelisable, its tasks need to be …
Independent