L9 - Cloud Storage Systems Flashcards
4 types of AWS storage
- Amazon Elastic Block Storage (EBS)
- Amazon EC2 Instance Storage
- Amazon Elastic File System (EFS)
- Amazon Simple Storage Service (S3)
2 types of storage devices for VMs
- Instance volumes
- EBS volumes
Instance volumes
Disks/SSDs attached to physical server
- Optimized for high IOPs rates
- Lost when VM is stopped
EBS volumes
Service providing volumes (Storage Area Network (SAN))
- can only be mounted to a single VM at a time
- survives stopping or termination of VM
- Boot device lost when VM is terminated
Types of storage
- Object store (S3)
- Shared file system (NAS) (EFS)
- Relational Databases (RDS)
- NoSQL databases
- data warehouses
6 characteristics of Cloud Storage Systems
- voluminous data
- commodity hardware (discrepancy btw. processor speed and storage access time)
- distributed data
- expect failures
- processing by applications
- optimization for dominant usage
CAP Theorem
Consistency, availability, and partition-tolerance cannot be achieved together in a distributed system
consistency (CP) = read returns the last write value (strict)
availability (AP) = all requests are answered in an acceptable time
partition-tolerance = the system continues working even if some nodes are separated
Which of the 3 aspects of the CAP Theorem is essentialal in large scale distributed cloud systems?
Partition-tolerance
–> Storage solutions focus on either availability (AP) or consistency (CP)
-> AP systems apply eventual consistency: providing consistency only after a certain time
What is S3 Object Storage most used for?
- backup
- data spread across >= 3 data centers in a region
Data management in S3
Two level hierarchy of buckets and data objects
- data objects have a name, blob of data (<5TB) and metadata
- data objects ca be searched by name, bucket name, and metadata BUT NOT CONTENT
5 storage classes in AWS S3
- standard
- reduced redundancy (for expected loss)
- intelligent tiering
- glacier (retrieval 1-5min)
- deep archive (retrieval 12h)
Data access and versioning and lifecyle in S3
- via Simple Object Access Protocol (SOAP), REST, Bit Torrent
- data cannot be modified only uploaded, deleted, retrieved
- versioning possible
- lifecycle: rules can be set of transition (migration of objects to another storage class; expiration: when an object can be deleted)
Consistency in S3
- When creating new objects the key (name) becomes visible only after all replicas were written (read-after-write)
- eventual consistency
Requirements of Google File System (GFS)
- most writes are appending at the end
- optimized for long sequential and short random reads/writes
- bandwidth is more important than latency (batch processing)
- support for concurrent modifications
Is it better to put files or larger ones on GFS?
Better put larger ones because:
Single master server and many chunk servers
-> large chunks reduce metadata and frequent connections to the chunk server