Big Data self study Flashcards
1
Q
MapReduce simplification apps?
A
Pig, Hive
2
Q
MapReduce data ingestion apps?
A
Flume, Sqoop
3
Q
MapReduce direct query apps?
A
HBase, Impala
4
Q
What is Hive?
A
- Hive is a data warehousing system
- Works with non-relational data
- HiveQL is used.
- Batch processing is done
5
Q
What is Pig?
A
- compiling a high-level scripting language, named Pig Latin, into MapReduce jobs for executing in Hadoop
- Pig Latin is scripting, therefore, procedural
6
Q
What is Flume?
A
- ingest data to HDFS
- harvesting large sets of data from server log files, like clickstream data from web server logs.
- can transform data while being harvested
7
Q
What is Sqoop?
A
- converting data back and forth between a relational database and the HDFS
8
Q
What is HBase?
A
- column-oriented NoSQL database
- highly distributed, designed to scale out
- Avoids the delays caused by batch processing
- interact with lower-level langs like Java
- used by FB for messaging system
9
Q
What is Impala?
A
- write SQL queries directly against the data while it is still in HDFS.
- heavy use of in-memory caching on data nodes
10
Q
What is YARN?
A
- Hadoop version 2
- separation of cluster resource management from jobs management.
- suitable for multitenancy