Big Data Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What are two main components of the Hadoop framework?

A

The Hadoop Distributed File System (HDFS), MapReduce, and YARN. Read more here.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain how MapReduce works as simply as possible.

A

“MapReduce is a programming model that enables distributed processing of large data sets on compute clusters of commodity hardware. Hadoop MapReduce first performs mapping which involves splitting a large file into pieces to make another set of data.” Read more here.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How would you sort a large list of numbers?

A

Answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Say you’re given a large data set. What would be your plan for dealing with outliers? How about missing values? How about transformations?

A

Answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly