Book - Chapter 10 mapreduce and Hadoop Flashcards
What can the map reduce paradigm offer
It’s offers a means to break a large task into smaller tasks, run tasks in parallel, and consolidate the outputs of the individual tasks into the final output
What are examples of Map reduce
IBM, LinkedIn, Yahoo
Map reduce consists of two basic parts
Map and reduce
What does the map part of map reduce do
Applies an operation to a piece of data. Provide some intermediate output
What does the reduce part of a map reduce do
Consolidate the intermediate outputs from the map steps. Provides the final output
What did Grace Hopper do
Described that you don’t build a bigger more expensive machine you add more machines instead
What is the HDFS based on
Google file system
HDFS depends on disks doing what
Each disk drives file system to manage the data being stored to the drive media
How does hadoop file system store blocks
In blocks of 64 MB or 128 MB
How many copies of each block is there
Three copies
What does the name node do
Determines and tracks where the various blocks of datafile are stored
What does the data node to
Manages the data stored on each machine
What is a secondary name node
Provides a capability to perform some of the name node tasks to reduce the load on the name node
What free classes are typical in the mapreduce in Java
The driver, the mapper, and the reducer
What is hadoop streaming API
Allows the user to write and run Hadoop jobs with no direct knowledge of Java