VL 9 Flashcards
Storage devices for VMs
• Instance volumes: Disks/SSD attached to physical server
• Optimized for high IOPS rates.
• Lost when VM is stopped.
• EBS volumes: Service providing volumes (storage area network)
• Can be mounted only in a single VM at a time. Thus not be usable for sharing information.
• Maximum size 16 TB
• Survive stopping or termination of VM.
• Boot device lost when VM is terminated (but you can specify to keep it as well)
Cloud Storage types
• Obiect store (S3)
• Shared file system (NAS) (EFS)
• Relational database (RDS)
NoSQL database (Dynamo DB)
• Data warehouse (Redshift)
• Timeseries, ledger, graph, … databases
Characteristics of cloud storage systems
Voluminous data
Commodity hardware
Distributed data
Expect failures
Processing by application
Optimization for dominant usage
CAP Theorem
Cannot be achieved together in distributed system
Consistency: read returns last written value
Availability: all requests are answered in acceptable time
Partition-tolerance: system continues working even if some nodes are separated
AP, CP
AP apply eventual consistency: providing it o my after certain time
Object storage: AWS S3
Simple storage service (S3):
Data spread out across at least three data centers in a region
Most used for backup
Data management: two level hierarchy of buckets and data objects
Data objects can be searched by name, bucket name, metadata but not content
AWS S3
Storage classes:
Standard
Reduced_redundancy
Intelligent_tiering
Glacier
Deep_archive
AWS S3
Data access: data objects can’t be modified
Versioning: object uploaded -> new version created
Object deleted -> only marked as deleted
Lifecycle: consists of rules that trigger two types of actions
Transition actions: migration of objects to another storage class
Expiration actions: define when objects expire and can be deleted by S3
Consistency AW S3
Create new object: key becomes visible only after all replicas were written
Updating/deleting: read operations returns latest version of object
Simultaneous puts: last write wins
Atomic puts to multiple keys not supported
Security AWS S3
Authentication via PKI
Access Control Lists on bucket
Contents can be encrypted
Google File System requirements
Survive failures of components
Files are huge
Most writes are appendings at the end
Optimized to support all common operations
Support for concurrent modifications
Google File System Architecture
Single master server and many chunk servers
Master holds metadata in main memory Multiple shadow masters to handle client reads
Directory structure is implemented as a lookup table, mapping pathnames to metadata. No ionodes
Google File System replication
3 replicas but can be adapted
Google File System failure detection
Master exchanges heartbeat wir the chunk servers
Google File System Data access
Clients first contact master but then interacts directly with chunk servers
One of 3 chunk servers is selected as primary and is responsible for updating the replicas
Google File System data integrity
Data integrity: each chunk server keeps a checksum
Consistency: system allows concurrent writes and appends to chunks
Google File System Consistency
System allows concurrent writes and appends to chunks
Data pushed to all replica servers
Google File System metadata
Master server contains metadata about all chunks
Each server stores metadata and checksum about its chunks
Google File System interactions for writes
- client asks master for all chunkservers
- Master grants a new lease on chunk, increases the chunk version number, tells all replicas to do the same. Replies to client. Client no longer has to talk to master
- client pushed data to all servers, not necessarily to primary first
- Once data is acknowledged, client sends write request to primary. Primary decides serialization order for all incoming modifications and applies them to the chunk
Google File System interactions for writes
- After finishing the modification, primary forwards write request and serialization order to secondaries, so they can apply modifications in same order. (If primary fails, this step is never reached.)
- All secondaries reply back to the primary once they finish the modifications
- Primary replies back to the client, either with success or error
Google File System interactions for appends
In step 4: primary checks to see if appending a record to current chunk would exceed max size (64MB)
If yes: pads chunk, notifies secondaries to do the same, tells client to retry request on the next chunk
Record append is restricted to 1/4th max chunk size (padding at most will be 16MB)
Record appends fails at any of replicas -> client must retry
Successful record append: data must have been written at the same offset on all replicas of the chunk
Google File System: limitations
Scalability of single master
Solutions: partitioning of file system and development of distributed master
64MB chunk size
No latency guarantees
AWS Elastic File System
Distributed file system
Capacity: unlimited file system size, individual files 48 TB
Throughout scales with file system size
AWS EFS
POSIX-compliant shared file storage
Automatic provisioning of storage capacity
Integrated with lifestyle management
Security: control access through POSIX permissions. Amazon VPC, AWS IAM
Close to open Consistency
Relational Database
Designed for vertical scaling
ACID properties of transactions
Atomicity: set of operations is successfully or it doesn’t change anything
Consistency
Isolation
Durability
AWS Relational Database Service
Provided standard relational dbs (postgreSQL, MySQL)
Configure multi AZ installation for automatic failover
Configure multiple read replicas
Amazon Autora
Relational db
Replication: 6 copies of data replicated across 3 availability zones
Up to 15 read copies can be configured
Automatic backups in S3
Automatic storage scaling
Features of NoSQL databases
Schema free
Support for non-relational data
Designed for horizontal scaling: automatic distribution
Auto-replication and caching
Types of noSQL dbs
Key-value database
Document oriented
Column family database
Graph database
Amazon dynamo
Key-value database
Optimized for small requests, quick access, high availability
Server less service
Fault tolerate
Automatic scaling of tables
Support for ACID transactions
Encryption by default
Fine grained access control for tables
DynamoDB
Decentralized architecture and eventual consistency semantics
Dynamo dB partitions
Tables are stored in partitions
Management of partitions
Mapping keys to partitions
Mapping partitions to nodes
Virtual nodes are assigned to physical nodes
Dynamo: Replication
Replication(N,R,W)
N consecutive nodes
If read successful on R copies, it is successful
Same for write on W copies
Typical configuration (3,2,2)
R+W>N ensures the most recent info is returned
Can be used to configure SLA requirements of the service
N determines durability
R and W latency
Dynamo DB FAILUREs
Gossip protocol