Map Reduce Flashcards

Question 1

Q

Which technology is Map Reduce a part of?

Answer

A

Hadoop. Hadoop consists of HDFS and Map Reduce

Question 2

Q

What is the data format of the Input in Map Reduce?

Answer

A

(Key, Value) pairs, of arbitrary serializable types, that should fit in memory

Question 3

Q

What strategy should be employed when cluster components fail during computation in Map Reduce?

Answer

A

To address cluster component failures, it is advisable to parallelize computation into small tasks. In the event that a task fails to deliver results, the recommended approach is to restart that specific task.

Question 4

Q

Where does the data come from in Map Reduce?

Answer

A

The (H)DFS

Question 5

Q

What are the 4 steps in the Map task?

Answer

A

Read (key, value) pairs in input, from the DFS
One Map task per pair (Will be scheduled on/near the machine where the input is).
Computes a number of (key, value) pairs, decided by you
Outputs to the (local!) disk in a buffer region

Question 6

Q

What is the main operation of the Shuffle (Master controller) task?

Answer

A

Keeps track of the (key, value) pairs in the output of all Map tasks. It then does a distributed group by key operation, which outputs the key(s) and its list of values

Question 7

Q

What 3 qualities defines the Reduce task?

Answer

A

One reduce works on one key at a time
Computes a combined value per key, decided by you
The output is saved to (H)DFS files (one reduce per task)

Question 8

Q

What are the 3 switch levels of data acquisition from HDFS to the MapReduce task, in order of fastest to slowest?

Answer

A

Data Local (On the same machine in the rack)
Rack Local (On the same rack, but different machines)
Off Rack (Between racks)

Question 9

Q

In general terms, what does the MapReduce task do?

Answer

A

It compresses several data entries of the same value, to a single self-specified new key, value (often count). Example: given the input “w1, w2, w3, w2, w3, w3, w3”, the output could be “(w1, 1), (w2, 2), (w3, 4)”.

Map Reduce Flashcards

(9 cards)