Class 3 Flashcards

1
Q

Block

A

Block: A small chunk of data

200 mb data, 64 mb block size so, 4 blocks

Concepts of hadoop:

  1. Hadoop automatically splits the file into blocks based on blocksize
  2. Hadoop automatically replicate the file based on replication factor

min replication - 1

default replication - 3

max replication - 512

  1. Hadoop automatically maintains the metadata in namenode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

deisgn of hdfs

A
  1. Storing of large amount of data
  2. streaming of data access -> write once read many times concept

update the entire file or override, but record level updates are not supported

  1. commodity hardware - eg: 1gb ram, 10gb storage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

hdfs drawbacks

A
  1. Low latency data access -> slow response time -> use hadoop eco system as solutions
  2. lots of small files
    eg: 1mb folder with 10000 files, we need to maintain 10000 filenames in namenode. We can overcome this problem with sequence file input format.
  3. Multiple files, arbitrary modifications are not allowed, overcome using hbase
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

cluster summary

A
  1. configured capacity eg: 2000gb
  2. non dfs - system programs and os eg: 800
  3. dfs used : 200
  4. dfs remaining 1000

df -h -> to know th usage, UI is also available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly