Ch 3 - class 3 Hadoop FIle System Flashcards

1
Q

what are two componants of hadooop

A

mapreduce and hdfs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

hdfs

A

file system to manage hard drive. on top of file system on hard drive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

command interface

A

use to communicate hdfs and hdd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

communicate to server from hdrive

A

winscp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

file system deal with

A

large files. write once, read many times, high throughput

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

data size?

A

block size. hdfs divided into blocks. 64mb by default, 128mb in practice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

can many files be on same block?

A

YES!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

check status of file system block

A

% hadoop fsck –files -blocks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Namenode

A

Manage filesystem namespace, keep track of blocks, block locations, namespace image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

cluster

A

name node, datanode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

single point of failure

A

persistand metadata files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

system has 2 namenodes

A

active and standby

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

datanode known as

A

workhorse of the file sytem. store and retreive blocks, report to namenode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

HDFS high avaiability

A

use pair of namenodes in active-standby configuration.

standby has latest log entreis and up to date block mapping in memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how do you set replication for data node

A

set dfs.replication=3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Psudo-distribted configuration

A

fs. dautlname=hdfs://localhost/

dfs. replication=1

17
Q

where is default filesystem

A

on master computer, namenode

18
Q

where is local filesystem?

A

on the server

19
Q

command to copy from hard drive to hdfs

A

hadoop fs -copyFromLocal input/docs/quangle.txt

hdfs://localhost/user/tom/quangle.txt

20
Q

for checksum

A

use md5 to check file integrity to compare
md5sum used hash function producing a 256 bit hash value, gives a checksum to verify data integrity.
256bit =32bytes=32characters.

21
Q

split of data

A

you want to split the data so it fits on 1 block..

so use block size for split size

22
Q

hdfs

A

just one implemenation of hadoop filesystem, s3 another.

23
Q

2 waits to catch exception

A

try or finally , 2 ways to catch the exception

24
Q

finally

A

regardless of exception or not. thats how diff from catch

it’s a strong method.

25
Q

how to tell if hdfs command or not

A

you will see hadoop fs
not hadoop URLCAT .etc.
hadoop URLCAT is java program

26
Q

FileSystemCat

A

public java hadoop program to handle file stream

bbbbbbbbbbbbbbb

27
Q

complete java program

A

public class and main method

28
Q

glob characters

A

regular expressions

29
Q

datanode error - client

A

ads and packets in ack queue to data queue
removeds the failed datanode from the pileline.
namenode - arranges under replicated blocks for further replicas.
failed datanode- deleted the parital block when the node recovers later on.

30
Q

ack

A

sent when data is received by datanode, not when written.

31
Q

way to write in paraallel to speed up process of copying data.

A

distcp is used for copying large amounts of data to and from hadoop filesystems in parallel.

32
Q

RPC

A

remote procedure call. communication with the nodes.

33
Q

HAR Files

A

are file archiving facility that packes files into HDFS blocks more efficiently.

34
Q

L=

r=

A

long, -r Recursive means show all the entries in the subtree

35
Q

-p

A

P option preserves file attributes(timestamp, ownership, permission, etc)