Week 1&2 Flashcards

Question 1

Q

What are 2 things that can be done at large scale that cannot be done at small scale

Answer

A

extract new insights

create new forms of values

Question 2

Q

four V’s in big data

Answer

A

Volume

Velocity ( how fast is data coming in and how fast do you have to analyze and use it)

Variety (# of sources)

Veracity(can you trust data/source/process .. data cleaning .. user entry errors)

Question 3

Q

Why was mapreduce made

Answer

A

to provide an abstraction that allows engineers to preform simple computations while hiding the details of parralellizations, data distribution, load balancing and fault tolerance

Question 4

Q

what does the mapper do

Answer

A

maps input key/val pairs to another set of intermediary key/vals (may map to zero or many output pairs)

Question 5

Q

what does the reducer do

Answer

A

reduces a set of intermediary vals which share a key to a smaller set of values

Question 6

Q

what are the 3 phases of the reducer

Answer

A

shuffle, sort, reduce

Question 7

Q

what are somne challenges that mapreduce solves/still has

Answer

A

dividing the work into equal size pieces, limited by slowest node, combing results when done

Question 8

Q

what does the programmer need to specify in map and reduce

Answer

A

the type of input and output key vals

map function

reduce function

Question 9

Q

how will this be reduced to?

Week 1&2 Flashcards

(9 cards)