Concepts Flashcards
Scaling up
When the need for parallelism arises, a single powerful computer is added with more CPU cores, more memory, and more hard disks
Scaling out
When the need for parallelism arises, the task is divided between a large number of less powerful machines with (relatively) slow CPUs, moderate memory amounts, moderate hard disk counts
Pros and Cons of Scaling Up vs Scaling Out
- Scaling up is more expensive than scaling out.
(Big high-end systems have much higher pricing for a given: CPU power, memory, and hard disk space) - Scaling out is more challenging for fault tolerance.
(A large number of loosely coupled systems means more components and thus more failures in hardware and in networking. Solution: Software fault tolerance) - Scaling out is more challenging for software development. (due to larger number of components, larger number of failures both in nodes and networking connecting them, and increased latencies. Solution: Scalable cloud platforms)
Cloud computing
Cloud computing is a model for enabling convenient,
on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models.
Essential charasteristics (5)
On-demand self-service, Broad network access, Resource pooling, Rapid elasticity, and Measured Service
Service models (3)
Cloud Software as a Service (SaaS),
Cloud Platform as a Service (PaaS),
and Cloud Infrastructure as a Service (IaaS)
Deployment models (4)
Private cloud,
Community cloud,
Public cloud,
and Hybrid cloud
Apache Spark
- Distributed programming framework for Big Data processing
- Based on functional programming
Resilient Distributed Datasets (RDDs)
- Resilient Distributed Datasets (RDDs) are Scala
collection-like entities that are distributed over several
computers - The framework stores the RDDs in partitions, where a
separate thread can process each partition - To implement fault tolerance, the RDD partitions record lineage: A recipe to recompute the RDD partition based on the parent RDDs and the operation used to generate the partition
- If a server goes down, the lineage information can be used to recompute its partitions on another server
RAID Redundant Array of Independent Disks
Most commonly used fault tolerance mechanism in small scale
RAID 0
Striping
- Stripes data over a large number of disks to improve sequential+random reads&writes
- Very bad choice for fault tolerance, only for scratch data that can be regenerated easily
RAID 1
Mirroring
- Each data block is written to two hard disks: first one is the master copy, and secondly a mirror slave copy is written after the master
- Reads are served by either one of the two hard disks
- Loses half the storage space and halves write bandwidth/IOPS compared to using two drives
- Data is available if either hard disk is available
- Easy repair: Replace the failed hard disk and copy all of the data over to the replacement drive
RAID 5
Block-level striping with distributed parity
- Data is stored on n + 1 hard disks, where a parity checksum block is stored for each n blocks of data. The
parity blocks are distributed over all disks. Tolerates one hard disk failure
RAID 5 - Properties (reads, writes, storage)
- Sequential reads and writes are striped over all disks
- Loses only one hard disk worth of storage to parity
- Sequential read and write speeds are good, random read IOPS are good
- Random small write requires reading one data block +
parity block, and writing back modified data block +
modified parity block, requiring 4 x IOPS in the worst case (battery backed up caches try to minimize this overhead.)
RAID 5 - Properties (rebuild)
- Rebuilding a disk requires reading all of the contents of the other n disks to regenerate the missing disk using parity - this is a slow process
- Slow during rebuild: When one disk has failed, each
missing data block read requires reading n blocks of data when rebuild is in progress - Vulnerability to a second hard disk failure during array
rebuild and long rebuild times make most vendors instead recommend RAID 6 (see two slides ahead) with large capacity SATA drives