20. Massive Distribution for Performance Flashcards

The Map- Reduce Approach. One way DS can tackle big problems

1
Q

What are the aims of a topologically-regular, closed interconnect?

What does this make more realistic?

A

Minimise distance
Maximise Bandwidth
Maximise homogeneity- so we can distribute tasks more easily
Maximise security

realistic:
Minimal Latency
Maximal throughput
low operational/maintenance cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the properties of a topologically-regular closed interconnect?

A

The interconnect is more reliable

The force of the axioms is greatly reduced.

The transparency goals are easier to achieve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What style of problems can computational clusters/data centres with topologically-regular closed interconnects be used to solve?

A

Can be used for both process and data-intensive problems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do we allow two nodes to communicate?

A

Layers of protocols around them.

Then DME, MME, …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we generalize message exchange?

A

RPC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why do we use data centres for DS using big data?

A

We cannot use the internet- we need a more controlled environment. We use data centres as they have their own in-house architecture that reduces the axioms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why do data centres use shared-nothing?

A

Its massively distributed and complicated- we want to keep some simplicity- Keep tasks independent so processes/partitions share nothing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is important during M-W split?

A

Clean split so that each serves as an individual unit of parallel processing. How many partitions to generate?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is important during M-W spawn?

A

Spawn parallel processes to work on each partition. Need to decide how many parallel processes to generate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is important during M-W process and merge?

A

Final result must be the same as had it been done without parallelization. How do we pick up the pieces and stick them together in a complete order?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a race condition?

A

When multiple processes try to read/write from the same memory location and hence execution order matters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe task independence?

A

When two processes can be executed in parallel- they have no shared state and can be done it any order.
The problem becomes the merge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe the map reduce model

A

Make the individual tasks as simple as possible by pretending things are functions.

Make things side affect free, no shared state (or copy shared data structures).

Simply, Map takes a unary function and a collection and returns a collection. Applies the function to each element of collection independently. Map is parallelisable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a barrier?

A

Used to synchronize the map phase with the reduce phase.

Knows how many processes it should wait for and holds everything until it gets them all.

Presupposes balanced workloads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do the map and reduce functions take and produce? i.e. word counting

A

Take key-value pairs

Map: in key, in value (i.e. list of strings) -> out key, intermediate value list (i.e. word and no of instances in line)

Reduce: out key, intermediate value list (output from map)-> out value list (all words and number of instances in whole thing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly