Concepts Flashcards
Scaling up
When the need for parallelism arises, a single powerful computer is added with more CPU cores, more memory, and more hard disks
Scaling out
When the need for parallelism arises, the task is divided between a large number of less powerful machines with (relatively) slow CPUs, moderate memory amounts, moderate hard disk counts
Pros and Cons of Scaling Up vs Scaling Out
- Scaling up is more expensive than scaling out.
(Big high-end systems have much higher pricing for a given: CPU power, memory, and hard disk space) - Scaling out is more challenging for fault tolerance.
(A large number of loosely coupled systems means more components and thus more failures in hardware and in networking. Solution: Software fault tolerance) - Scaling out is more challenging for software development. (due to larger number of components, larger number of failures both in nodes and networking connecting them, and increased latencies. Solution: Scalable cloud platforms)
Cloud computing
Cloud computing is a model for enabling convenient,
on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models.
Essential charasteristics (5)
On-demand self-service, Broad network access, Resource pooling, Rapid elasticity, and Measured Service
Service models (3)
Cloud Software as a Service (SaaS),
Cloud Platform as a Service (PaaS),
and Cloud Infrastructure as a Service (IaaS)
Deployment models (4)
Private cloud,
Community cloud,
Public cloud,
and Hybrid cloud
Apache Spark
- Distributed programming framework for Big Data processing
- Based on functional programming
Resilient Distributed Datasets (RDDs)
- Resilient Distributed Datasets (RDDs) are Scala
collection-like entities that are distributed over several
computers - The framework stores the RDDs in partitions, where a
separate thread can process each partition - To implement fault tolerance, the RDD partitions record lineage: A recipe to recompute the RDD partition based on the parent RDDs and the operation used to generate the partition
- If a server goes down, the lineage information can be used to recompute its partitions on another server
RAID Redundant Array of Independent Disks
Most commonly used fault tolerance mechanism in small scale
RAID 0
Striping
- Stripes data over a large number of disks to improve sequential+random reads&writes
- Very bad choice for fault tolerance, only for scratch data that can be regenerated easily
RAID 1
Mirroring
- Each data block is written to two hard disks: first one is the master copy, and secondly a mirror slave copy is written after the master
- Reads are served by either one of the two hard disks
- Loses half the storage space and halves write bandwidth/IOPS compared to using two drives
- Data is available if either hard disk is available
- Easy repair: Replace the failed hard disk and copy all of the data over to the replacement drive
RAID 5
Block-level striping with distributed parity
- Data is stored on n + 1 hard disks, where a parity checksum block is stored for each n blocks of data. The
parity blocks are distributed over all disks. Tolerates one hard disk failure
RAID 5 - Properties (reads, writes, storage)
- Sequential reads and writes are striped over all disks
- Loses only one hard disk worth of storage to parity
- Sequential read and write speeds are good, random read IOPS are good
- Random small write requires reading one data block +
parity block, and writing back modified data block +
modified parity block, requiring 4 x IOPS in the worst case (battery backed up caches try to minimize this overhead.)
RAID 5 - Properties (rebuild)
- Rebuilding a disk requires reading all of the contents of the other n disks to regenerate the missing disk using parity - this is a slow process
- Slow during rebuild: When one disk has failed, each
missing data block read requires reading n blocks of data when rebuild is in progress - Vulnerability to a second hard disk failure during array
rebuild and long rebuild times make most vendors instead recommend RAID 6 (see two slides ahead) with large capacity SATA drives
RAID 6
Block-level striping with double distributed parity
- Data is stored on n + 2 hard disks, where two parity
checksum blocks are stored for each n
blocks of data. The parity blocks are distributed over all disks. Tolerates two hard disk failures
RAID 6 - Properties (reads, writes, storage)
- Sequential reads and writes are striped over all disks
- Loses two hard disk worth of storage to parity
- Sequential read and write speeds are good, random read IOPS are good
- Random small write requires reading one data block + two parity blocks, and writing back modified data block + two modified parity blocks, requiring 6 x IOPS in the worst case (battery backed up caches try to minimize this overhead)
- Slow during rebuild: When 1-2 disks have failed, each
missing data block read requires reading n blocks of data when rebuild is in progress
RAID 10
Stripe of Mirrors
- Data is stored on 2n hard disks, each mirror pair consists of two hard disks and the pairs are striped together to a single logical device
RAID 10 -Properties (reads, writes, storage)
- Tolerates only one hard disk failure in the worst case (n - one disk from each mirror pair - in the best case)
- Loses half of the storage space
- Sequential reads and writes are striped over all disks
- Sequential read and write speeds are good, random read and write IOPS are good
- Each data block is written to two hard disks. Random small write require 2 x IOPS in the worst case
- Quicker during rebuild: Missing data is just copied over from the other disk of the mirror. Use of “hot spares” recommended to speed up recovery start
RAID Usage Scenarios
- RAID 0: Temporary space for data that can be discarded
- RAID 1: Small scale installations, server boot disks, etc.
- RAID 5: Some fault tolerance with minimum storage
overhead, small random writes can be problematic, RAID 5 Write Hole problem without special hardware (battery backed up cache / UPS) - RAID 6: Fairly good fault tolerance with reasonable storage overhead, small random writes can be even more problematic than in RAID 5, RAID 6 Write Hole problem without special hardware (battery backed up cache / UPS)
- RAID 10: Less fault tolerant than RAID 6 but more fault
tolerant than RAID 5. Loses half of the storage capacity.
Good small random write performance makes this often
the choice for database workloads. Can avoid the “Write hole problem” under write ordering guarantees.
Database ACID guarantees
- Atomicity: Database modifications must follow an ”all or nothing” rule. Each transaction is said to be atomic. If one part of the transaction fails, the entire transaction fails and the database state is left unchanged.
- Consistency (as defined in databases): Any transaction the database performs will take it from one consistent state to another.
- Isolation: No transaction should be able to interfere with another transaction at all.
- Durability: Once a transaction has been committed, it will remain so.
CAP properties (in distributed databases)
- Consistency: All nodes have a consistent view of the contents of the (distributed) database
- Availability: A guarantee that every database request
eventually receives a response about whether it was
successful or whether it failed - Partition Tolerance: The system continues to operate
despite arbitrary message loss
Brewer’s CAP Theorem
It is impossible to create a distributed system that is at the same time satisfies all three CAP properties:
- Consistency
- Availability
- Partition tolerance
CA
- A non-distributed (centralized) database system
- Example: Centralized version control - Svn
CP
- A distributed database that can not be modified when
network splits to partitions - Example: Google Bigtable
AP
- A distributed database that can become inconsistent when network splits into partitions
- Example: Distributed version control system - Git
System Tradeoffs - CA Systems
- Limited scalability - use when expected system load is low
- Easiest to implement
- Easy to program against
- High latency if clients are not geographically close
System Tradeoffs - CP Systems
- Good scalability
- Relatively easy to program against
- Data consistency achieved in a scalable way
- High latency of updates
- Quorum algorithms (e.g., Paxos) can make the system
available if a majority of the system components are
connected to each other - Will eventually result in user visible unavailability for
updates after network partition
System Tradeoffs - AP Systems
- Extremely good scalability
- Very difficult to program against - There is no single
algorithm to merge inconsistent updates. Analogy: In
distributed version control systems there will always be
some conflicting updates that can not both be done, and there is no single general method to intelligently choose which one of the conflicting updates should be applied, if any.
Consistent hashing
- Consistent hashing is a way to distribute the contents of a hash table over a distributed hash table (DHT)
- The main advantage of the method is its elasticity. Namely if hash table contains K keys, and is distributed over n servers, either adding or removing a server will only require the relocation of O(K/n) keys
FLP Theorem
There is no algorithm that will solve the
asynchronous consensus problem!
Paxos Algorithm
- The Paxos algorithm was designed to be a sound
algorithm for the asynchronous consensus problem - If the Paxos algorithm terminates, then its output is a correct outcome of a consensus algorithm
- It terminates in practice with very high probability