Hadoop Flashcards

1
Q

hadoop: The first phase in a mapreduce program is the

A

map phase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

hadoop: The map job basically

A

splits the search for relevant data across multiple nodes, and then collects the relevant data back into one node.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

hadoop: Apache Spark is

A

Apache Spark is an open-source cluster computing framework that allows you to load data into a cluster’s memory and query it repeatedly, which is ideal for machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

hadoop: ETL stands for

A

Extraction, Transformation, and Loading

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Big Data: The three Vs that define big data are

A

velocity, variety, volume

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

hadoop: The cluster is usually made up of

A

mid range rack mounted servers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

hadoop: Hive allows you to

A

use sql for the mapreduce instead of having to code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

hadoop: Impala is

A

a way to query data using sql without using mapreduce

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

hadoop: sqoop is used to

A

move data from a relational database to hdfs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

spark: Spark is

A

much faster than mapreduce

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

spark: RDD stands for

A

Resilient Distributed Dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly