Class 3 Flashcards
1
Q
Block
A
Block: A small chunk of data
200 mb data, 64 mb block size so, 4 blocks
Concepts of hadoop:
- Hadoop automatically splits the file into blocks based on blocksize
- Hadoop automatically replicate the file based on replication factor
min replication - 1
default replication - 3
max replication - 512
- Hadoop automatically maintains the metadata in namenode
2
Q
deisgn of hdfs
A
- Storing of large amount of data
- streaming of data access -> write once read many times concept
update the entire file or override, but record level updates are not supported
- commodity hardware - eg: 1gb ram, 10gb storage
3
Q
hdfs drawbacks
A
- Low latency data access -> slow response time -> use hadoop eco system as solutions
- lots of small files
eg: 1mb folder with 10000 files, we need to maintain 10000 filenames in namenode. We can overcome this problem with sequence file input format. - Multiple files, arbitrary modifications are not allowed, overcome using hbase
4
Q
cluster summary
A
- configured capacity eg: 2000gb
- non dfs - system programs and os eg: 800
- dfs used : 200
- dfs remaining 1000
df -h -> to know th usage, UI is also available