MapReduce Flashcards

1
Q

How MapReduce works?

A

a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How the BSP superstep works?

A

a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s wrong with MapReduce for Graphs?

A

a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name two uses of dataflow that you learned from the course (we covered three).

A

a. Pig Latin, for MapReduce tasks

b. TensorFlow, for neural network architectures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data can also be processed in multiple stages (pipelining). Name two architectures we covered, that permit such multistage data analysis.

A

a. YARN, ie. MapReduce v2

b. BSP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Motivaion of MapReducer

A
  • This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Example: Count word occurrences

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Visual diagram of map reducer execution

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a combiner?

A

Optionally, before forwarding to shufflers, a ‘combiner’ operation in each node can be set up to perform a local per-key reduction - if specified, this would be ‘step 1.5’, in the above workflow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is functional programming?

A

In functional programming, functions are first-class objects that can be passed into a function as arguments; a function can also be returned from a function as output. JavaScript, Python etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Examples of function as input

A

map(), filter() and reduce()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Mapreducer use cases

A

process large amounts of raw data, such as crawled documents, web request logs, etc., to compute various kinds of derived data, such as inverted indices, various representations of the graph structure of web documents, summaries of the number of pages crawled per host, the set of most frequent queries in a given day, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is GFS

A

We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why GFS what needed?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why Hadoop, what was the need

A

It all started in the year 2002 with the Apache Nutch project.

In 2002, Doug Cutting and Mike Cafarella were working on Apache Nutch Project that aimed at building a web search engine that would crawl and index websites.

This project proved to be too expensive and thus found infeasible for indexing billions of webpages. So they were looking for a feasible solution that would reduce the cost.

To solve this they needed a way to store very large files and data, and to be able to process those large data efficiently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Hadoop solution?

A

Doug and Mike implemented GFS and MapReduce to store and process big data!

17
Q

What is hadoop hive

A
18
Q

What is hadoop pig

A

a

19
Q

What is hadoop musketeer

A

s

20
Q

WHat is yarn

A

k

21
Q

spark vs storm

A

a

22
Q

flink

A

j

23
Q

dask

A

a

24
Q

spark

A

a

25
Q
A