class 2 Flashcards

Question 1

Q

file system types

Answer

A

fs (File system)

DFS (distributed file system)

HDFS (Hadoop distributed file system)

Question 2

Q

drawbacks of ‘FS’

Answer

A

Storing the large amount of data
Processing large amount of data
dataloss:
- power failure
- network failure
- Hardware or software failure
Auto meta data concepts

DFS:

multi node concept

remaining all are same defects of FS

Question 3

Q

cluster

Answer

A

group of machines in a network

Question 4

Q

distribution of hadoop

Answer

A

cloudera

hortonworks

IBM big insights

pivotal hd

mapr

Cloud computing:

Amazon EMR

google cloud

windows azure

Question 5

Q

Hadoop framework

Answer

A

Hadoop 1.x –> 2010

Hadoop 2.x –> Yarn -> 2013

Question 6

Q

Hadoop 1.x

Answer

A

Name node -> to store metadata
data node -> to store actual data
Secondary Name node -> To maintain backup of name node
Job tracker -> to split the job into tasks and assign tasks to task tracker
Task tracker -> to execute the task

Question 7

Q

client request process sequence

Answer

A

client sends request to name node (which stores metadata)
if it is a namenode, the metadata is updated
if it is an existing request -> the location of actual data is provided (data node)
secondary namenode backups the name node
Job tracker: split the tasks and assign to task trackers
task trackers are initiated where the data resides

Question 8

Q

Name node contains

Answer

A

metadata
location of the data
ip address of the data nodes

secondary namenode only backups the data, but doesn’t interact with job trackers

hadoop 1.x is also called master slace architecture

Question 9

Q

master slave architecture

Answer

A

job tracker and task tracker maintaing rpc communication

rpc - remote procedural communication

bydefault - 3 seconds

every 3 second task tracker sends heartbeat to job tracker, to notify that it is working and not down.

case 1: if data node is down

since we have replication, we can execute the task

case 2: if task tracker is down

we can assign the task to another task tracker

case 3: if secondary name node is down

we will not stop job execution, just the backup is stopped

case 4: if the job tracker has stopped

we stop the execution - SPOF (single point of failure)

case 5 : namenode is down

since the secondary namenode not for processing, so SPOF

we have two drawbacks in hadoop, namenode or job tracker is down, then processing is stopped

class 2 Flashcards

(9 cards)