E5 Flashcards
1
Q
How distributed file systems work?
A
- All files are broken into blocks
- These blocks are replicated among the HDFS servers (datanodes)
- The namenode provides a lookup service for clients accessing data and ensures the nodes are correctly replicated across the cluster
2
Q
Hadoop MapReduce
A
- Hadoop MapReduce is a distributed computing paradigm originally pioneered by Google
- Used to process data in the batch layer
3
Q
Map
A
input data is split into discrete chunks to be processed
split-apply
4
Q
Reduce
A
output of the map phase is aggregated to produce the desired result
(combine)
5
Q
The simple nature of the programming model (MapReduce) lends itself to
A
efficient and large-scale implementations across thousands of cheap nodes
6
Q
Key benefits of MapReduce
A
- Simplicity
- Scalability
- Speed
- Recovery
- Minimal data motion
7
Q
Limitations of MapReduce
A
- MapReduce is designed specifically for batch processing
2. Low level framework (hard to use)