MapReduce Flashcards

1
Q

What is MapReduce?

A

A programming paradigm (way of doing programming) that leverages the computational resources of our cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does distributive computing need?

A

Storage and computation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is MapReduce a programming model for and which languages can implement it?

A

Processing large data and any programming language can implement it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the Map Phase?

A

1) Divide the data set into chunks.
2) Have a separate process work on each chunk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What’s another name for chunks?

A

Input splits.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What’s another name for the process working on the chunks/input splits?

A

Mappers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some qualities of a mapper?

A

1) Each mapper processes one record at a time.
2) Each mapper executes the same set of code on each record.
3) The output of the mapper will be a key-value pair.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some features of an input split?

A

1) Input split respects logical record boundaries.
2) An abstraction (a Java class that works behind the scenes with pointers to start and end locations within blocks)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a mapper?

A

A program that is invoked by the Hadoop framework once per every record in the input split. The output of the mapper should be a key-pair value.

ex. 10 records means the mapper will be executed 10 times.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Reduce Phase?

A

The reducers work on the output of the mappers.

The output of the individual mappers are grouped by the key and passed to the reducer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the shuffle phase?

A

The process in which the output of the mappers is transferred to the reducers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the shuffle phase sort?

A

In the map phase, each key is assigned to a partition by a class called partitioner. Within each partition, the key-value pairs will be sorted by key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the shuffle phase copy?

A

Once the key-value pairs are sorted, the key-value pairs are then copied to the appropriate reducer based on the partition they belong to.

One partition == one reducer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Shuffle Merge?

A

The merging of different key-value pairs from different mappers to maintain the sort order.

The keys will be unique to each reducer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the combiner?

A

Optional during the map phase, combiner is used to reduce the amount of data that is given to the reducer.

It acts as a mini-reducer that runs after the mapper and before the reducer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Can we use a combiner program in any MapReduce problem?

A

Yes, a combiner can be used in many MapReduce problems.