EMR Flashcards
1
Q
What are the different nodes for EMR?
A
Master Node
Core Node
Task Node
2
Q
What file system can EMR use?
A
- HDFS
- EMRFS
- EBS FS
3
Q
Is there data stored on Task Nodes?
A
No HDFS data is stored on Task Nodes
4
Q
Should you use HDFS for a large number of small files?
A
No, use HBASE instead
5
Q
What are the two types of clusters?
A
- Persistent Clusters
- Transient Clusters
6
Q
What processing Paradigm does EMR use?
A
MapReduce
7
Q
What are the 7 main steps with MapReduce?
A
Input, Split, Map, Shuffle, Sort, Reduce, Output
8
Q
Why use HDFS?
A
Fault tolerant by replicating copies