Week 1&2 Flashcards

1
Q

What are 2 things that can be done at large scale that cannot be done at small scale

A

extract new insights

create new forms of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

four V’s in big data

A

Volume

Velocity ( how fast is data coming in and how fast do you have to analyze and use it)

Variety (# of sources)

Veracity(can you trust data/source/process .. data cleaning .. user entry errors)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why was mapreduce made

A

to provide an abstraction that allows engineers to preform simple computations while hiding the details of parralellizations, data distribution, load balancing and fault tolerance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what does the mapper do

A

maps input key/val pairs to another set of intermediary key/vals (may map to zero or many output pairs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what does the reducer do

A

reduces a set of intermediary vals which share a key to a smaller set of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the 3 phases of the reducer

A

shuffle, sort, reduce

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are somne challenges that mapreduce solves/still has

A

dividing the work into equal size pieces, limited by slowest node, combing results when done

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what does the programmer need to specify in map and reduce

A

the type of input and output key vals

map function

reduce function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how will this be reduced to?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly