Hadoop Flashcards by a w

hadoop: The first phase in a mapreduce program is the

map phase

How well did you know this?

Not at all

Perfectly

hadoop: The map job basically

splits the search for relevant data across multiple nodes, and then collects the relevant data back into one node.

How well did you know this?

Not at all

Perfectly

hadoop: Apache Spark is

Apache Spark is an open-source cluster computing framework that allows you to load data into a cluster’s memory and query it repeatedly, which is ideal for machine learning.

How well did you know this?

Not at all

Perfectly

hadoop: ETL stands for

Extraction, Transformation, and Loading

How well did you know this?

Not at all

Perfectly

Big Data: The three Vs that define big data are

velocity, variety, volume

How well did you know this?

Not at all

Perfectly

hadoop: The cluster is usually made up of

mid range rack mounted servers

How well did you know this?

Not at all

Perfectly

hadoop: Hive allows you to

use sql for the mapreduce instead of having to code

How well did you know this?

Not at all

Perfectly

hadoop: Impala is

a way to query data using sql without using mapreduce

How well did you know this?

Not at all

Perfectly

hadoop: sqoop is used to

move data from a relational database to hdfs

How well did you know this?

Not at all

Perfectly

spark: Spark is

much faster than mapreduce

How well did you know this?

Not at all

Perfectly

spark: RDD stands for

Resilient Distributed Dataset

How well did you know this?

Not at all

Perfectly