Map-Reduce Flashcards

1
Q

What is “big data”?

A

A high volume of data, such that cannot be analysed by typical methods.
Also can be a high velocity or variety of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the phases of Map-reduce?

A

Map phase: partition the problem space.
Group phase: make groups by some trait.
Reduce phase: combine contents of each group in some way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is HBase?

Give one example of who uses HBase!

A

A NoSQL key-value database. Used by Facebook.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is EMR?

A

Elastic Map-Reduce. Runs on AWS IAAS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are disadvantages of PAAS?

A

Developers get locked into a platform, and may not be able to use some known tools cause they’re not available on that service.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Map-Reduce not good for?

A

Communication intensive parallel tasks with MPI.

Applications that often require to edit/update (write) existing datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Origins of Map-Reduce?

A

Google technology. Functional Programming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does “data locality” mean?

A

That input data is stored on/near local disks of the computing machines so travel between them requires the least network bandwidth.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
What is "module availability"?
What is the formula?
A

Measure of service availability.

= MTTF/(MTTF+MTTR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the primary cause of failures in large big data systems?

A

Human operators.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly