Book - Chapter 10 mapreduce and Hadoop Flashcards

1
Q

What can the map reduce paradigm offer

A

It’s offers a means to break a large task into smaller tasks, run tasks in parallel, and consolidate the outputs of the individual tasks into the final output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are examples of Map reduce

A

IBM, LinkedIn, Yahoo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Map reduce consists of two basic parts

A

Map and reduce

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the map part of map reduce do

A

Applies an operation to a piece of data. Provide some intermediate output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the reduce part of a map reduce do

A

Consolidate the intermediate outputs from the map steps. Provides the final output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What did Grace Hopper do

A

Described that you don’t build a bigger more expensive machine you add more machines instead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the HDFS based on

A

Google file system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

HDFS depends on disks doing what

A

Each disk drives file system to manage the data being stored to the drive media

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does hadoop file system store blocks

A

In blocks of 64 MB or 128 MB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How many copies of each block is there

A

Three copies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the name node do

A

Determines and tracks where the various blocks of datafile are stored

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the data node to

A

Manages the data stored on each machine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a secondary name node

A

Provides a capability to perform some of the name node tasks to reduce the load on the name node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What free classes are typical in the mapreduce in Java

A

The driver, the mapper, and the reducer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is hadoop streaming API

A

Allows the user to write and run Hadoop jobs with no direct knowledge of Java

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is pig

A

High-level data flow programming language

17
Q

What is hive

A

SQL like access

18
Q

What is mahout

A

Provides analytical tools

19
Q

What is H base

A

Provides real-time read and write

20
Q

What is the dataflow language in pig

A

Pig Latin

21
Q

What are the three main characteristics of pig

A

Ease of programming, behind-the-scenes code optimisation, and extensibility of capabilities

22
Q

Pick allows execution of user defined functions what are these known as

A

UDFs

23
Q

If you have a table structure what tool might you use

A

Hive