Map-Reduce Flashcards
What is “big data”?
A high volume of data, such that cannot be analysed by typical methods.
Also can be a high velocity or variety of data.
What are the phases of Map-reduce?
Map phase: partition the problem space.
Group phase: make groups by some trait.
Reduce phase: combine contents of each group in some way.
What is HBase?
Give one example of who uses HBase!
A NoSQL key-value database. Used by Facebook.
What is EMR?
Elastic Map-Reduce. Runs on AWS IAAS.
What are disadvantages of PAAS?
Developers get locked into a platform, and may not be able to use some known tools cause they’re not available on that service.
What is Map-Reduce not good for?
Communication intensive parallel tasks with MPI.
Applications that often require to edit/update (write) existing datasets.
Origins of Map-Reduce?
Google technology. Functional Programming.
What does “data locality” mean?
That input data is stored on/near local disks of the computing machines so travel between them requires the least network bandwidth.
What is "module availability"? What is the formula?
Measure of service availability.
= MTTF/(MTTF+MTTR)
What is the primary cause of failures in large big data systems?
Human operators.