technology and tools Flashcards
what is Hadoop?
open source distributed computing framework
what is Hadoop written in?
java
what are the 4 main compensates of Hadoop?
map reduce
YARN
HDFS
Hadoop common
what size blocks does a Hadoop distributed file system (hdfs) use?
128 mb blocks
in the HDFS of Hadoop is failure normal?
yes as its highly fault tolerable
what is the name node?
master server
what does the name node do?
holds file system
undertakes file and directory operations
maps blocks to datanodes
what is the data node?
a file split into more than one block
what do data nodes do?
read and write requests
reports back to namenode
what is bad about HDFS?
not good for small reads
not good for many small files
append not amend
what is map reduce?
java based programming paradigm
when to use map reduce?
problems that are embarrassingly parallel
what does the Map from map reduce do?
Performs a map function on input key-value pairs to generate intermediate key-value pairs
what does the reduce from map reduce do?
Performs a reduce function on intermediate key-value groups to generate output key-value pairs
name a case where you would use map reduce?
data mining spam detection ad optimisation index building in search engines article clustering for news statistical machine translation
What does YARN stand for?
Yet another resource negotiator
What does yarn do?
Manages and monitors workloads
What are the main features of yarn? A shared B fast C scalability D flexibility E efficiency
A
C
D
E
What is pig
Data flow language
What is hive/hiveQL
SQL style query language