2. MapReduce & YARN Flashcards
Which of the following is NOT a phase in the MapReduce model?
A) Map
B) Shuffle
C) Sort
D) Reduce
C) Sort
In the MapReduce framework, what is the output of the Map phase?
A) A list of values
B) A single key-value pair
C) Multiple key-value pairs
D) A single value
C) Multiple key-value pairs
What is the role of the Shuffle phase in MapReduce?
A) To sort the output of the Map phase
B) To combine the results of different mappers
C) To distribute data among reducers based on the key
D) To write the final output to HDFS
C) To distribute data among reducers based on the key
How many classes are typically required to develop a MapReduce program using Java API?
A) One
B) Two
C) Three
D) Four
C) Three
What is the main role of the ApplicationMaster in YARN?
A) To manage resources of a single machine/server
B) To handle client requests and monitor NodeManagers
C) To apply for resources and allocate them to internal tasks
D) To isolate a portion of the machine’s resources
C) To apply for resources and allocate them to internal tasks
Which component of YARN is responsible for managing resources in the whole cluster?
A) NodeManager
B) ApplicationMaster
C) ResourceManager
D) Container
C) ResourceManager
In MapReduce, what happens if you set the number of reducers to zero?
A) The Map phase is skipped
B) The Shuffle phase is skipped
C) The Reduce phase is skipped
D) The entire job fails
C) The Reduce phase is skipped
Which of the following is NOT a feature of the MapReduce computational paradigm?
A) Scalability beyond thousands of machines
B) Input/output data stored in HDFS
C) Real-time data processing
D) Ease of programming with only map and reduce functions
C) Real-time data processing
Which of the following is true about MapReduce?
A) The Reduce phase starts after all mappers are finished
B) Mappers and reducers can run in parallel
C) Each reducer processes data from a single mapper
D) The number of reducers is determined by the input data size
A) The Reduce phase starts after all mappers are finished
Which of the following is NOT a component of YARN?
A) ApplicationMaster
B) ResourceManager
C) DataNode
D) NodeManager
C) DataNode
MapReduce is suitable for:
A) Real-time data processing
B) Processing large datasets in a distributed manner
C) Small-scale data analysis
D) Online transaction processing
B) Processing large datasets in a distributed manner
The output of the Reduce phase in MapReduce is:
A) A set of key-value pairs
B) A single value
C) A list of values
D) Intermediate data for the next MapReduce job
A) A set of key-value pairs
In Hadoop’s MapReduce, the number of reducers:
A) Is determined by the size of the input data
B) Is fixed and cannot be changed
C) Can be specified by the user
D) Is equal to the number of mappers
C) Can be specified by the user
Which of the following is a benefit of using YARN in a Hadoop cluster?
A) It reduces the need for data replication
B) It allows for the dynamic allocation of cluster resources
C) It eliminates the need for MapReduce
D) It simplifies the process of writing MapReduce jobs
B) It allows for the dynamic allocation of cluster resources