Ch 3 - class 3 Hadoop FIle System Flashcards by pmb26@yahoo.com Bick

what are two componants of hadooop

mapreduce and hdfs

How well did you know this?

Not at all

Perfectly

hdfs

file system to manage hard drive. on top of file system on hard drive

How well did you know this?

Not at all

Perfectly

command interface

use to communicate hdfs and hdd

How well did you know this?

Not at all

Perfectly

communicate to server from hdrive

winscp

How well did you know this?

Not at all

Perfectly

file system deal with

large files. write once, read many times, high throughput

How well did you know this?

Not at all

Perfectly

data size?

block size. hdfs divided into blocks. 64mb by default, 128mb in practice.

How well did you know this?

Not at all

Perfectly

can many files be on same block?

YES!

How well did you know this?

Not at all

Perfectly

check status of file system block

% hadoop fsck –files -blocks

How well did you know this?

Not at all

Perfectly

Namenode

Manage filesystem namespace, keep track of blocks, block locations, namespace image

How well did you know this?

Not at all

Perfectly

cluster

name node, datanode

How well did you know this?

Not at all

Perfectly

single point of failure

persistand metadata files

How well did you know this?

Not at all

Perfectly

system has 2 namenodes

active and standby

How well did you know this?

Not at all

Perfectly

datanode known as

workhorse of the file sytem. store and retreive blocks, report to namenode.

How well did you know this?

Not at all

Perfectly

HDFS high avaiability

use pair of namenodes in active-standby configuration.

standby has latest log entreis and up to date block mapping in memory

How well did you know this?

Not at all

Perfectly

how do you set replication for data node

set dfs.replication=3

How well did you know this?

Not at all

Perfectly

Psudo-distribted configuration

Study These Flashcards

fs. dautlname=hdfs://localhost/

dfs. replication=1

where is default filesystem

Study These Flashcards

on master computer, namenode

where is local filesystem?

Study These Flashcards

on the server

command to copy from hard drive to hdfs

Study These Flashcards

hadoop fs -copyFromLocal input/docs/quangle.txt

hdfs://localhost/user/tom/quangle.txt

for checksum

Study These Flashcards

use md5 to check file integrity to compare
md5sum used hash function producing a 256 bit hash value, gives a checksum to verify data integrity.
256bit =32bytes=32characters.

split of data

Study These Flashcards

you want to split the data so it fits on 1 block..

so use block size for split size

hdfs

Study These Flashcards

just one implemenation of hadoop filesystem, s3 another.

2 waits to catch exception

Study These Flashcards

try or finally , 2 ways to catch the exception

finally

Study These Flashcards

regardless of exception or not. thats how diff from catch

it’s a strong method.

how to tell if hdfs command or not

you will see hadoop fs not hadoop URLCAT .etc. hadoop URLCAT is java program

FileSystemCat

public java hadoop program to handle file stream | bbbbbbbbbbbbbbb

complete java program

public class and main method

glob characters

regular expressions

datanode error - client

ads and packets in ack queue to data queue removeds the failed datanode from the pileline. namenode - arranges under replicated blocks for further replicas. failed datanode- deleted the parital block when the node recovers later on.

ack

sent when data is received by datanode, not when written.

way to write in paraallel to speed up process of copying data.

distcp is used for copying large amounts of data to and from hadoop filesystems in parallel.

RPC

remote procedure call. communication with the nodes.

HAR Files

are file archiving facility that packes files into HDFS blocks more efficiently.

L= | r=

long, -r Recursive means show all the entries in the subtree

-p

P option preserves file attributes(timestamp, ownership, permission, etc)

Ch 3 - class 3 Hadoop FIle System Flashcards

(35 cards)