Mapreduce Flashcards

1
Q

What is the MapReduce Programming Model’s Data type?

A

Key-value records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the MapReduce Programming Model’s map function?

A

(Kin, Vin)–>list(Kinter, Vinter)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the MapReduce Programming Model’s reduce function?

A

(Kinter, list(Vinter))–>list(Kout, Vout)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Check notes for wordcount example

A

Check notes for wordcount example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Apache Hadoop MapReduce?

A

An open source implementation of Google’s MapReduce framework.
There are two ways to write jobs:
–Java
–Hadoop Streaming(for Python, Perl, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain Google PageRank.

A

The PageRank algorithm rates linked documents (web pages)
It is the basis of the Google search engine for the ranking of web pages.
Principle: The numerical weight (PageRank) PRp of a web page p depends on the number and the numerical weight of the web pages, which link to p.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the PageRank algorithm?

A

Go to notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Parallelisation improves computation efficiency but what are some challenges it has faced?

A

Managing multiple servers is difficult because servers need coordination and server failure should not affect job execution.
Achieving parallelisation is difficult because tasks must be made fully independent. It’s hardware to rationale program irregularities (debugging).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly