MAPREDUCE Flashcards
What is mapreduce and hadoop mapreduce
mapreduce is a programming model for processing and generating large datasets.
hadoop mapreduce is an implementation of this model
how does map reduce work
1)iterate over numerous records
2)extract data as key-value pair
3)aggregate results
4)save the results in hdfs
what is combiner and how it works with map reduce
combiner is like a mini reducer works during map phase to pre-aggregate data when the function is associative and commutative.
it happens before shuffling and aggregating.
it reduces intermediate data and reduce network traffic.
what is partitioning mapreduce
the partitioner directs map outputs to appropriate reducer by applying a function on that key.