Big Data self study Flashcards

1
Q

MapReduce simplification apps?

A

Pig, Hive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

MapReduce data ingestion apps?

A

Flume, Sqoop

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

MapReduce direct query apps?

A

HBase, Impala

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Hive?

A
  • Hive is a data warehousing system
  • Works with non-relational data
  • HiveQL is used.
  • Batch processing is done
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Pig?

A
  • compiling a high-level scripting language, named Pig Latin, into MapReduce jobs for executing in Hadoop
  • Pig Latin is scripting, therefore, procedural
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Flume?

A
  • ingest data to HDFS
  • harvesting large sets of data from server log files, like clickstream data from web server logs.
  • can transform data while being harvested
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Sqoop?

A
  • converting data back and forth between a relational database and the HDFS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is HBase?

A
  • column-oriented NoSQL database
  • highly distributed, designed to scale out
  • Avoids the delays caused by batch processing
  • interact with lower-level langs like Java
  • used by FB for messaging system
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Impala?

A
  • write SQL queries directly against the data while it is still in HDFS.
  • heavy use of in-memory caching on data nodes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is YARN?

A
  • Hadoop version 2
  • separation of cluster resource management from jobs management.
  • suitable for multitenancy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly