Ch 3 - class 3 Hadoop FIle System Flashcards
what are two componants of hadooop
mapreduce and hdfs
hdfs
file system to manage hard drive. on top of file system on hard drive
command interface
use to communicate hdfs and hdd
communicate to server from hdrive
winscp
file system deal with
large files. write once, read many times, high throughput
data size?
block size. hdfs divided into blocks. 64mb by default, 128mb in practice.
can many files be on same block?
YES!
check status of file system block
% hadoop fsck –files -blocks
Namenode
Manage filesystem namespace, keep track of blocks, block locations, namespace image
cluster
name node, datanode
single point of failure
persistand metadata files
system has 2 namenodes
active and standby
datanode known as
workhorse of the file sytem. store and retreive blocks, report to namenode.
HDFS high avaiability
use pair of namenodes in active-standby configuration.
standby has latest log entreis and up to date block mapping in memory
how do you set replication for data node
set dfs.replication=3