Big Data Flashcards
1
Q
What are two main components of the Hadoop framework?
A
The Hadoop Distributed File System (HDFS), MapReduce, and YARN. Read more here.
2
Q
Explain how MapReduce works as simply as possible.
A
“MapReduce is a programming model that enables distributed processing of large data sets on compute clusters of commodity hardware. Hadoop MapReduce first performs mapping which involves splitting a large file into pieces to make another set of data.” Read more here.
3
Q
How would you sort a large list of numbers?
A
Answer
4
Q
Say you’re given a large data set. What would be your plan for dealing with outliers? How about missing values? How about transformations?
A
Answer