VL 9 Flashcards
Storage devices for VMs
• Instance volumes: Disks/SSD attached to physical server
• Optimized for high IOPS rates.
• Lost when VM is stopped.
• EBS volumes: Service providing volumes (storage area network)
• Can be mounted only in a single VM at a time. Thus not be usable for sharing information.
• Maximum size 16 TB
• Survive stopping or termination of VM.
• Boot device lost when VM is terminated (but you can specify to keep it as well)
Cloud Storage types
• Obiect store (S3)
• Shared file system (NAS) (EFS)
• Relational database (RDS)
NoSQL database (Dynamo DB)
• Data warehouse (Redshift)
• Timeseries, ledger, graph, … databases
Characteristics of cloud storage systems
Voluminous data
Commodity hardware
Distributed data
Expect failures
Processing by application
Optimization for dominant usage
CAP Theorem
Cannot be achieved together in distributed system
Consistency: read returns last written value
Availability: all requests are answered in acceptable time
Partition-tolerance: system continues working even if some nodes are separated
AP, CP
AP apply eventual consistency: providing it o my after certain time
Object storage: AWS S3
Simple storage service (S3):
Data spread out across at least three data centers in a region
Most used for backup
Data management: two level hierarchy of buckets and data objects
Data objects can be searched by name, bucket name, metadata but not content
AWS S3
Storage classes:
Standard
Reduced_redundancy
Intelligent_tiering
Glacier
Deep_archive
AWS S3
Data access: data objects can’t be modified
Versioning: object uploaded -> new version created
Object deleted -> only marked as deleted
Lifecycle: consists of rules that trigger two types of actions
Transition actions: migration of objects to another storage class
Expiration actions: define when objects expire and can be deleted by S3
Consistency AW S3
Create new object: key becomes visible only after all replicas were written
Updating/deleting: read operations returns latest version of object
Simultaneous puts: last write wins
Atomic puts to multiple keys not supported
Security AWS S3
Authentication via PKI
Access Control Lists on bucket
Contents can be encrypted
Google File System requirements
Survive failures of components
Files are huge
Most writes are appendings at the end
Optimized to support all common operations
Support for concurrent modifications
Google File System Architecture
Single master server and many chunk servers
Master holds metadata in main memory Multiple shadow masters to handle client reads
Directory structure is implemented as a lookup table, mapping pathnames to metadata. No ionodes
Google File System replication
3 replicas but can be adapted
Google File System failure detection
Master exchanges heartbeat wir the chunk servers
Google File System Data access
Clients first contact master but then interacts directly with chunk servers
One of 3 chunk servers is selected as primary and is responsible for updating the replicas
Google File System data integrity
Data integrity: each chunk server keeps a checksum
Consistency: system allows concurrent writes and appends to chunks