E5 Flashcards

Question 1

Q

How distributed file systems work?

Answer

A

All files are broken into blocks
These blocks are replicated among the HDFS servers (datanodes)
The namenode provides a lookup service for clients accessing data and ensures the nodes are correctly replicated across the cluster

Question 2

Q

Hadoop MapReduce

Answer

A

Hadoop MapReduce is a distributed computing paradigm originally pioneered by Google
Used to process data in the batch layer

Question 3

Q

Map

Answer

A

input data is split into discrete chunks to be processed

split-apply

Question 4

Q

Reduce

Answer

A

output of the map phase is aggregated to produce the desired result

(combine)

Question 5

Q

The simple nature of the programming model (MapReduce) lends itself to

Answer

A

efficient and large-scale implementations across thousands of cheap nodes

Question 6

Q

Key benefits of MapReduce

Answer

A

Question 7

Q

Limitations of MapReduce

Answer

A

2. Low level framework (hard to use)

(7 cards)