4A_Replication & Partitioning Flashcards

1
Q

What are the reasons for replication?

A

Performance:
- Scale in size; more capacity
- Scale geographically; Closer to client = lower latency
Redundancy:
- Have other copies when a component fails

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Redundancy vs Availability?

A

Availability also means ability to progress, not only read data but also able to do writes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Definition of CAP?

A

Consistency, Availability and Partition Resistance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the different consistency models?

A

Linearisability
Sequential consistency
Causal Consistency
Eventual Consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is linearisability?

A

Clients see atomic writes in same order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is sequential consistency?

A

Clients see writes in same order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is causal consistency?

A

Clients see causally related writes in same order, unrelated writes possibly in different order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is eventual consistency?

A

Eventually all replicas converge.

Clients may see differences meanwhile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Arguments linearisability

A

Global lock on replicas

Not scalable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Arguments Sequential Consistency

A

Need total order on writes

Hard, but some capacity possible. Use MutEx solutions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Arguments causal consistency

A

Causal relations via vector clocks.

Hard, need per-process info

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Arguments against consistency in general?

A

Does not scale

Slow and expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Replication Protocols?

A

Primary / Backup
Active Replication
Quorum Based

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is primary / backup replication protocol?

A
Primary is implicit sequencer.
Variations:
- Send write operations to backups
- Send result of write operation to backups
- Send invalidation to backups
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is active replication protocol?

A

All replicas perform the write operation

Need explicit sequencer / total ordering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is caching?

A

Able to replicate temporarily

17
Q

What are scaling techniques?

A
Bigger Machines
Virtualisation
Asynchronous communication
Replication and caching
Partitioning
18
Q

What is partitioning?

A

Split work/responsibilities over a set of servers

Common method is hierarchical

19
Q

Second common method partitioning?

A

Formulaic. Data is divided into parts according to some formula. Parts are distributed to servers via another formula using the server ID and a meta server remembers which parts are where

20
Q

What falls under service partitioning?

A

Split different things:
Obvious: Web, DB, DNS
Others: signup, login, search, user accounts, inventory

21
Q

What falls under data partitioning?

A

Splitting similar things:

Split by customer ID, last name, location. This is often hierarchically and splits are often equal in size (hashing)

22
Q

What is Hadoop?

A

Partition Data over many disks

Process data in parallel

23
Q

What is MapReduce data flow?

A

First you split your data. The map is some sort of formula function which is ran in parallel. Then the results are merged and a reduce function is applied after which the results are returned

24
Q

Why is consistent hashing used?

A

A method to divide data over servers

25
Q

What are the problems of dividing data over servers based on hashing?

A

If one server goes down, all data has to move