E5 Flashcards

1
Q

How distributed file systems work?

A
  1. All files are broken into blocks
  2. These blocks are replicated among the HDFS servers (datanodes)
  3. The namenode provides a lookup service for clients accessing data and ensures the nodes are correctly replicated across the cluster
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Hadoop MapReduce

A
  • Hadoop MapReduce is a distributed computing paradigm originally pioneered by Google
  • Used to process data in the batch layer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Map

A

input data is split into discrete chunks to be processed

split-apply

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Reduce

A

output of the map phase is aggregated to produce the desired result

(combine)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The simple nature of the programming model (MapReduce) lends itself to

A

efficient and large-scale implementations across thousands of cheap nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Key benefits of MapReduce

A
  1. Simplicity
  2. Scalability
  3. Speed
  4. Recovery
  5. Minimal data motion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Limitations of MapReduce

A
  1. MapReduce is designed specifically for batch processing

2. Low level framework (hard to use)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly