Hadoop Ecosystem Flashcards
1
Q
Hadoop Ecosystem
A
One possible Hadoop Ecosystem
2
Q
HDFS
A
- Scalable Storage
- Fault Tolerance
3
Q
YARN
A
Flexible scheduling and resource management
==> YARN schedules jobs on > 40,000 servers at Yahoo
4
Q
MapReduce
A
Simplified programming model
- Map -> apply()
- Reduce -> summarize()
==> Google used MapReduce for indexing web sites
5
Q
Hive-Pig
A
Higher-level programming models
- Pig = dataflow scripting
- Hive = SQL-like queries
==> Pig created at Yahoo, Hive created at Facebook
6
Q
Giraph
A
Specialized models for graph processing
Giraph used by Facebook to analyze social graphs
7
Q
Storm-Spark-Flink
A
Real-time and in-memory processing
In-memory -> 100x faster for some tasks
8
Q
HBase-Cassandra-MongoDB
A
NoSQL for non-files
- Key-values
- Sparse tables
==> HBase used for Facebook’s Messaging Platform
9
Q
Zookeeper
A
Zookeeper for management
- Synchronization
- High-availability
- Configuration
==> Created by Yahoo to wrangle services named after animals