Hadoop Ecosystem Flashcards

1
Q

Hadoop Ecosystem

A

One possible Hadoop Ecosystem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

HDFS

A
  • Scalable Storage
  • Fault Tolerance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

YARN

A

Flexible scheduling and resource management

==> YARN schedules jobs on > 40,000 servers at Yahoo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

MapReduce

A

Simplified programming model

  • Map -> apply()
  • Reduce -> summarize()

==> Google used MapReduce for indexing web sites

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hive-Pig

A

Higher-level programming models

  • Pig = dataflow scripting
  • Hive = SQL-like queries

==> Pig created at Yahoo, Hive created at Facebook

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Giraph

A

Specialized models for graph processing

Giraph used by Facebook to analyze social graphs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Storm-Spark-Flink

A

Real-time and in-memory processing

In-memory -> 100x faster for some tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

HBase-Cassandra-MongoDB

A

NoSQL for non-files

  • Key-values
  • Sparse tables

==> HBase used for Facebook’s Messaging Platform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Zookeeper

A

Zookeeper for management

  • Synchronization
  • High-availability
  • Configuration

==> Created by Yahoo to wrangle services named after animals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly