Basics Flashcards
Hadoop uses _____ as its basic data unit
Key value pairs
Difference between SQL and Mapreduce
SQL = Declaritive
Mapreduce Specify the steps to solve the problem
Hadoop is best used as a write _____ read _____ type of data store
Once
Many
Under the MapReduce model, the data processing primitives are called
mappers and reducers
A _____ is a pair of consecutive words
Bigram
MapReduce uses ____ and _________ as its main data primitives
Lists
Key Value Pairs
In the MapReduce framework you write applications by specifying the ______ and the _______
mapper
reducer
In simple terms the mapper is meant to __________ the input into something that the reducer can ___________ over
filter and transform
aggregate over
The input format for processing multiple files is usually
list()
The input format for processing one large file, such as a log file is
list()
Authentication can be based on 1. ______________ or if security is turned on 2._______________
- user.name query parameter (as part of the HTTP query string)
- then it relies on Kerberos.
Security features of Hadoop consist of
authentication,
service level authorisation,
authentication for Web consoles and data confidentiality.
The simplest way to do authentication is
using kinit command of Kerberos.
What is Yarn
Yet Another Resource Negotiator - A cluster management technology.
A large-scale, distributed operating system for big data applications.
Your MapReduce programs then process data, but they usually don’t read any HDFS files directly. Instead they …
Instead they rely on the MapReduce framework to read and parse the HDFS files into individual records (key/ value pairs), which are the unit of data MapReduce programs do work on.
After distributing the data to different nodes, the only time nodes communicate with each other is at the
“shuffle” step
The ________ servers as the base class for mappers and reducers
MapReduceBase class
To serve as a mapper, a class should
Inherit the MapReduceBase class and implement the Mapper interface