MapReduce Flashcards

Question

Answer 1

A

a. Pig Latin, for MapReduce tasks

b. TensorFlow, for neural network architectures

Answer 2

A

a. YARN, ie. MapReduce v2

b. BSP

Answer 3

A

This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.

Answer 4

A

Optionally, before forwarding to shufflers, a ‘combiner’ operation in each node can be set up to perform a local per-key reduction - if specified, this would be ‘step 1.5’, in the above workflow.

Answer 5

A

In functional programming, functions are first-class objects that can be passed into a function as arguments; a function can also be returned from a function as output. JavaScript, Python etc.

Answer 6

A

map(), filter() and reduce()

Answer 7

A

process large amounts of raw data, such as crawled documents, web request logs, etc., to compute various kinds of derived data, such as inverted indices, various representations of the graph structure of web documents, summaries of the number of pages crawled per host, the set of most frequent queries in a given day, etc.

Answer 8

A

We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.

Answer 9

A

It all started in the year 2002 with the Apache Nutch project.

In 2002, Doug Cutting and Mike Cafarella were working on Apache Nutch Project that aimed at building a web search engine that would crawl and index websites.

This project proved to be too expensive and thus found infeasible for indexing billions of webpages. So they were looking for a feasible solution that would reduce the cost.

To solve this they needed a way to store very large files and data, and to be able to process those large data efficiently