Concepts Flashcards

1
Q

Scaling up

A

When the need for parallelism arises, a single powerful computer is added with more CPU cores, more memory, and more hard disks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Scaling out

A

When the need for parallelism arises, the task is divided between a large number of less powerful machines with (relatively) slow CPUs, moderate memory amounts, moderate hard disk counts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Pros and Cons of Scaling Up vs Scaling Out

A
  • Scaling up is more expensive than scaling out.
    (Big high-end systems have much higher pricing for a given: CPU power, memory, and hard disk space)
  • Scaling out is more challenging for fault tolerance.
    (A large number of loosely coupled systems means more components and thus more failures in hardware and in networking. Solution: Software fault tolerance)
  • Scaling out is more challenging for software development. (due to larger number of components, larger number of failures both in nodes and networking connecting them, and increased latencies. Solution: Scalable cloud platforms)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Cloud computing

A

Cloud computing is a model for enabling convenient,
on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Essential charasteristics (5)

A
On-demand self-service,
Broad network access, 
Resource pooling, 
Rapid elasticity, 
and Measured Service
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Service models (3)

A

Cloud Software as a Service (SaaS),
Cloud Platform as a Service (PaaS),
and Cloud Infrastructure as a Service (IaaS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Deployment models (4)

A

Private cloud,
Community cloud,
Public cloud,
and Hybrid cloud

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Apache Spark

A
  • Distributed programming framework for Big Data processing

- Based on functional programming

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Resilient Distributed Datasets (RDDs)

A
  • Resilient Distributed Datasets (RDDs) are Scala
    collection-like entities that are distributed over several
    computers
  • The framework stores the RDDs in partitions, where a
    separate thread can process each partition
  • To implement fault tolerance, the RDD partitions record lineage: A recipe to recompute the RDD partition based on the parent RDDs and the operation used to generate the partition
  • If a server goes down, the lineage information can be used to recompute its partitions on another server
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

RAID Redundant Array of Independent Disks

A

Most commonly used fault tolerance mechanism in small scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

RAID 0

A

Striping

  • Stripes data over a large number of disks to improve sequential+random reads&writes
  • Very bad choice for fault tolerance, only for scratch data that can be regenerated easily
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

RAID 1

A

Mirroring

  • Each data block is written to two hard disks: first one is the master copy, and secondly a mirror slave copy is written after the master
  • Reads are served by either one of the two hard disks
  • Loses half the storage space and halves write bandwidth/IOPS compared to using two drives
  • Data is available if either hard disk is available
  • Easy repair: Replace the failed hard disk and copy all of the data over to the replacement drive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

RAID 5

A

Block-level striping with distributed parity
- Data is stored on n + 1 hard disks, where a parity checksum block is stored for each n blocks of data. The
parity blocks are distributed over all disks. Tolerates one hard disk failure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

RAID 5 - Properties (reads, writes, storage)

A
  • Sequential reads and writes are striped over all disks
  • Loses only one hard disk worth of storage to parity
  • Sequential read and write speeds are good, random read IOPS are good
  • Random small write requires reading one data block +
    parity block, and writing back modified data block +
    modified parity block, requiring 4 x IOPS in the worst case (battery backed up caches try to minimize this overhead.)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

RAID 5 - Properties (rebuild)

A
  • Rebuilding a disk requires reading all of the contents of the other n disks to regenerate the missing disk using parity - this is a slow process
  • Slow during rebuild: When one disk has failed, each
    missing data block read requires reading n blocks of data when rebuild is in progress
  • Vulnerability to a second hard disk failure during array
    rebuild and long rebuild times make most vendors instead recommend RAID 6 (see two slides ahead) with large capacity SATA drives
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

RAID 6

A

Block-level striping with double distributed parity
- Data is stored on n + 2 hard disks, where two parity
checksum blocks are stored for each n
blocks of data. The parity blocks are distributed over all disks. Tolerates two hard disk failures

17
Q

RAID 6 - Properties (reads, writes, storage)

A
  • Sequential reads and writes are striped over all disks
  • Loses two hard disk worth of storage to parity
  • Sequential read and write speeds are good, random read IOPS are good
  • Random small write requires reading one data block + two parity blocks, and writing back modified data block + two modified parity blocks, requiring 6 x IOPS in the worst case (battery backed up caches try to minimize this overhead)
  • Slow during rebuild: When 1-2 disks have failed, each
    missing data block read requires reading n blocks of data when rebuild is in progress
18
Q

RAID 10

A

Stripe of Mirrors
- Data is stored on 2n hard disks, each mirror pair consists of two hard disks and the pairs are striped together to a single logical device

19
Q

RAID 10 -Properties (reads, writes, storage)

A
  • Tolerates only one hard disk failure in the worst case (n - one disk from each mirror pair - in the best case)
  • Loses half of the storage space
  • Sequential reads and writes are striped over all disks
  • Sequential read and write speeds are good, random read and write IOPS are good
  • Each data block is written to two hard disks. Random small write require 2 x IOPS in the worst case
  • Quicker during rebuild: Missing data is just copied over from the other disk of the mirror. Use of “hot spares” recommended to speed up recovery start
20
Q

RAID Usage Scenarios

A
  • RAID 0: Temporary space for data that can be discarded
  • RAID 1: Small scale installations, server boot disks, etc.
  • RAID 5: Some fault tolerance with minimum storage
    overhead, small random writes can be problematic, RAID 5 Write Hole problem without special hardware (battery backed up cache / UPS)
  • RAID 6: Fairly good fault tolerance with reasonable storage overhead, small random writes can be even more problematic than in RAID 5, RAID 6 Write Hole problem without special hardware (battery backed up cache / UPS)
  • RAID 10: Less fault tolerant than RAID 6 but more fault
    tolerant than RAID 5. Loses half of the storage capacity.
    Good small random write performance makes this often
    the choice for database workloads. Can avoid the “Write hole problem” under write ordering guarantees.
21
Q

Database ACID guarantees

A
  • Atomicity: Database modifications must follow an ”all or nothing” rule. Each transaction is said to be atomic. If one part of the transaction fails, the entire transaction fails and the database state is left unchanged.
  • Consistency (as defined in databases): Any transaction the database performs will take it from one consistent state to another.
  • Isolation: No transaction should be able to interfere with another transaction at all.
  • Durability: Once a transaction has been committed, it will remain so.
22
Q

CAP properties (in distributed databases)

A
  • Consistency: All nodes have a consistent view of the contents of the (distributed) database
  • Availability: A guarantee that every database request
    eventually receives a response about whether it was
    successful or whether it failed
  • Partition Tolerance: The system continues to operate
    despite arbitrary message loss
23
Q

Brewer’s CAP Theorem

A

It is impossible to create a distributed system that is at the same time satisfies all three CAP properties:

  • Consistency
  • Availability
  • Partition tolerance
24
Q

CA

A
  • A non-distributed (centralized) database system

- Example: Centralized version control - Svn

25
Q

CP

A
  • A distributed database that can not be modified when
    network splits to partitions
  • Example: Google Bigtable
26
Q

AP

A
  • A distributed database that can become inconsistent when network splits into partitions
  • Example: Distributed version control system - Git
27
Q

System Tradeoffs - CA Systems

A
  • Limited scalability - use when expected system load is low
  • Easiest to implement
  • Easy to program against
  • High latency if clients are not geographically close
28
Q

System Tradeoffs - CP Systems

A
  • Good scalability
  • Relatively easy to program against
  • Data consistency achieved in a scalable way
  • High latency of updates
  • Quorum algorithms (e.g., Paxos) can make the system
    available if a majority of the system components are
    connected to each other
  • Will eventually result in user visible unavailability for
    updates after network partition
29
Q

System Tradeoffs - AP Systems

A
  • Extremely good scalability
  • Very difficult to program against - There is no single
    algorithm to merge inconsistent updates. Analogy: In
    distributed version control systems there will always be
    some conflicting updates that can not both be done, and there is no single general method to intelligently choose which one of the conflicting updates should be applied, if any.
30
Q

Consistent hashing

A
  • Consistent hashing is a way to distribute the contents of a hash table over a distributed hash table (DHT)
  • The main advantage of the method is its elasticity. Namely if hash table contains K keys, and is distributed over n servers, either adding or removing a server will only require the relocation of O(K/n) keys
31
Q

FLP Theorem

A

There is no algorithm that will solve the

asynchronous consensus problem!

32
Q

Paxos Algorithm

A
  • The Paxos algorithm was designed to be a sound
    algorithm for the asynchronous consensus problem
  • If the Paxos algorithm terminates, then its output is a correct outcome of a consensus algorithm
  • It terminates in practice with very high probability