Revature Hadoop Flashcards

Question 1

Q

What is name node?

Answer

A

The name node is a component of HDFS that acts as the master server managing the file system namespace and regulates file access. They manage the data nodes on a HDFS cluster and there is only one per cluster, but can be multiple backup name nodes

Question 2

Q

What is a data nodes?

Answer

A

Data nodes manage the storage attached to the node it is running on. It is stored in 128 MB blocks by default which are replicated across other data nodes in clusters.

Question 3

Q

What is Yarn*

Answer

A

Yarn stands for yet another resource negotiator. It is a resource management and job scheduling technology used with Hadoops distributed processing framework.

Question 4

Q

Explain the Yarn component Node Manager

Answer

A

The Node manager manages application containers assigned by the resource manager monitoring the resource usage and reports it to the resource manager.

Question 5

Q

With the Yarn Node Manager explain containers*

Answer

A

The containers are a set of resources such as RAM CPU and storage on a single node. The resources are allocated by the Resource manager and monitored by the manager node

Question 6

Q

Explain the Yarn component Resource Manager

Answer

A

The resource manager is the master node manager that manages the resource allocation and scheduling across ALL nodes in the Hadoop cluster.

.

Question 7

Q

Explain map reduce

Answer

A

Map reduce is a programming model and processing framework for distributed computing. Its used for process large data sets across clusters of machines in parallels. This is the core processing mechanism for Hadoop

Question 8

Q

Explain what map does in hadoop

Answer

A

Map splits the data up into multiple smaller chunks and generates key value pairs for each chunk. They are then grouped together by the key to be passed to the reducer.

Question 9

Q

Explain what reduce does in hadoop

Answer

A

The reducer processes each group and gives an output.

Question 10

Q

How are data nodes fault tolerant

Answer

A

They are fault tolerant through data replication where multiple copies of the data are available across multiple nodes

Question 11

Q

How many name nodes exist in a cluster

Answer

A

There is 1 name node per cluster

Question 12

Q

What is the default number of replications for each block?

Answer

A

3 by default

Revature Hadoop Flashcards

(12 cards)