4A_Replication & Partitioning Flashcards by Deleted Deleted

What are the reasons for replication?

Performance:
- Scale in size; more capacity
- Scale geographically; Closer to client = lower latency
Redundancy:
- Have other copies when a component fails

How well did you know this?

Not at all

Perfectly

Redundancy vs Availability?

Availability also means ability to progress, not only read data but also able to do writes

How well did you know this?

Not at all

Perfectly

Definition of CAP?

Consistency, Availability and Partition Resistance

How well did you know this?

Not at all

Perfectly

What are the different consistency models?

Linearisability
Sequential consistency
Causal Consistency
Eventual Consistency

How well did you know this?

Not at all

Perfectly

What is linearisability?

Clients see atomic writes in same order

How well did you know this?

Not at all

Perfectly

What is sequential consistency?

Clients see writes in same order

How well did you know this?

Not at all

Perfectly

What is causal consistency?

Clients see causally related writes in same order, unrelated writes possibly in different order.

How well did you know this?

Not at all

Perfectly

What is eventual consistency?

Eventually all replicas converge.

Clients may see differences meanwhile.

How well did you know this?

Not at all

Perfectly

Arguments linearisability

Global lock on replicas

Not scalable

How well did you know this?

Not at all

Perfectly

Arguments Sequential Consistency

Need total order on writes

Hard, but some capacity possible. Use MutEx solutions

How well did you know this?

Not at all

Perfectly

Arguments causal consistency

Causal relations via vector clocks.

Hard, need per-process info

How well did you know this?

Not at all

Perfectly

Arguments against consistency in general?

Does not scale

Slow and expensive

How well did you know this?

Not at all

Perfectly

Replication Protocols?

Primary / Backup
Active Replication
Quorum Based

How well did you know this?

Not at all

Perfectly

What is primary / backup replication protocol?

Primary is implicit sequencer.
Variations:
- Send write operations to backups
- Send result of write operation to backups
- Send invalidation to backups

How well did you know this?

Not at all

Perfectly

What is active replication protocol?

All replicas perform the write operation

Need explicit sequencer / total ordering

How well did you know this?

Not at all

Perfectly

What is caching?

Able to replicate temporarily

What are scaling techniques?

Bigger Machines
Virtualisation
Asynchronous communication
Replication and caching
Partitioning

What is partitioning?

Split work/responsibilities over a set of servers

Common method is hierarchical

Second common method partitioning?

Formulaic. Data is divided into parts according to some formula. Parts are distributed to servers via another formula using the server ID and a meta server remembers which parts are where

What falls under service partitioning?

Split different things:
Obvious: Web, DB, DNS
Others: signup, login, search, user accounts, inventory

What falls under data partitioning?

Splitting similar things:

Split by customer ID, last name, location. This is often hierarchically and splits are often equal in size (hashing)

What is Hadoop?

Partition Data over many disks

Process data in parallel

What is MapReduce data flow?

First you split your data. The map is some sort of formula function which is ran in parallel. Then the results are merged and a reduce function is applied after which the results are returned

Why is consistent hashing used?

A method to divide data over servers

What are the problems of dividing data over servers based on hashing?

If one server goes down, all data has to move