VL 9 Flashcards

Question 1

Q

Storage devices for VMs

Answer

A

• Instance volumes: Disks/SSD attached to physical server
• Optimized for high IOPS rates.
• Lost when VM is stopped.
• EBS volumes: Service providing volumes (storage area network)
• Can be mounted only in a single VM at a time. Thus not be usable for sharing information.
• Maximum size 16 TB
• Survive stopping or termination of VM.
• Boot device lost when VM is terminated (but you can specify to keep it as well)

Question 2

Q

Cloud Storage types

Answer

A

• Obiect store (S3)
• Shared file system (NAS) (EFS)
• Relational database (RDS)
NoSQL database (Dynamo DB)
• Data warehouse (Redshift)
• Timeseries, ledger, graph, … databases

Question 3

Q

Characteristics of cloud storage systems

Answer

A

Voluminous data
Commodity hardware
Distributed data
Expect failures
Processing by application
Optimization for dominant usage

Question 4

Q

CAP Theorem

Answer

A

Cannot be achieved together in distributed system
Consistency: read returns last written value
Availability: all requests are answered in acceptable time
Partition-tolerance: system continues working even if some nodes are separated
AP, CP
AP apply eventual consistency: providing it o my after certain time

Question 5

Q

Object storage: AWS S3

Answer

A

Simple storage service (S3):
Data spread out across at least three data centers in a region
Most used for backup
Data management: two level hierarchy of buckets and data objects
Data objects can be searched by name, bucket name, metadata but not content

Question 6

Q

AWS S3

Answer

A

Storage classes:
Standard
Reduced_redundancy
Intelligent_tiering
Glacier
Deep_archive

Question 7

Q

AWS S3

Answer

A

Data access: data objects can’t be modified
Versioning: object uploaded -> new version created
Object deleted -> only marked as deleted
Lifecycle: consists of rules that trigger two types of actions
Transition actions: migration of objects to another storage class
Expiration actions: define when objects expire and can be deleted by S3

Question 8

Q

Consistency AW S3

Answer

A

Create new object: key becomes visible only after all replicas were written
Updating/deleting: read operations returns latest version of object
Simultaneous puts: last write wins
Atomic puts to multiple keys not supported

Question 9

Q

Security AWS S3

Answer

A

Authentication via PKI
Access Control Lists on bucket
Contents can be encrypted

Question 10

Q

Google File System requirements

Answer

A

Survive failures of components
Files are huge
Most writes are appendings at the end
Optimized to support all common operations
Support for concurrent modifications

Question 11

Q

Google File System Architecture

Answer

A

Single master server and many chunk servers
Master holds metadata in main memory Multiple shadow masters to handle client reads

Directory structure is implemented as a lookup table, mapping pathnames to metadata. No ionodes

Question 12

Q

Google File System replication

Answer

A

3 replicas but can be adapted

Question 13

Q

Google File System failure detection

Answer

A

Master exchanges heartbeat wir the chunk servers

Question 14

Q

Google File System Data access

Answer

A

Clients first contact master but then interacts directly with chunk servers
One of 3 chunk servers is selected as primary and is responsible for updating the replicas

Question 15

Q

Google File System data integrity

Answer

A

Data integrity: each chunk server keeps a checksum
Consistency: system allows concurrent writes and appends to chunks

Question 16

Q

Google File System Consistency

Answer

A

System allows concurrent writes and appends to chunks
Data pushed to all replica servers

Question 17

Q

Google File System metadata

Answer

A

Master server contains metadata about all chunks
Each server stores metadata and checksum about its chunks

Question 18

Q

Google File System interactions for writes

Answer

A

client asks master for all chunkservers
Master grants a new lease on chunk, increases the chunk version number, tells all replicas to do the same. Replies to client. Client no longer has to talk to master
client pushed data to all servers, not necessarily to primary first
Once data is acknowledged, client sends write request to primary. Primary decides serialization order for all incoming modifications and applies them to the chunk

Question 19

Q

Google File System interactions for writes

Answer

A

After finishing the modification, primary forwards write request and serialization order to secondaries, so they can apply modifications in same order. (If primary fails, this step is never reached.)
All secondaries reply back to the primary once they finish the modifications
Primary replies back to the client, either with success or error

Question 20

Q

Google File System interactions for appends

Answer

A

In step 4: primary checks to see if appending a record to current chunk would exceed max size (64MB)
If yes: pads chunk, notifies secondaries to do the same, tells client to retry request on the next chunk
Record append is restricted to 1/4th max chunk size (padding at most will be 16MB)

Record appends fails at any of replicas -> client must retry
Successful record append: data must have been written at the same offset on all replicas of the chunk

Question 21

Q

Google File System: limitations

Answer

A

Scalability of single master
Solutions: partitioning of file system and development of distributed master
64MB chunk size
No latency guarantees

Question 22

Q

AWS Elastic File System

Answer

A

Distributed file system
Capacity: unlimited file system size, individual files 48 TB
Throughout scales with file system size

Question 23

Q

AWS EFS

Answer

A

POSIX-compliant shared file storage
Automatic provisioning of storage capacity
Integrated with lifestyle management
Security: control access through POSIX permissions. Amazon VPC, AWS IAM
Close to open Consistency

Question 24

Q

Relational Database

Answer

A

Designed for vertical scaling
ACID properties of transactions
Atomicity: set of operations is successfully or it doesn’t change anything
Consistency
Isolation
Durability

Question 25

Q

AWS Relational Database Service

Answer

A

Provided standard relational dbs (postgreSQL, MySQL)
Configure multi AZ installation for automatic failover
Configure multiple read replicas

Question 26

Q

Amazon Autora

Answer

A

Relational db
Replication: 6 copies of data replicated across 3 availability zones
Up to 15 read copies can be configured
Automatic backups in S3
Automatic storage scaling

Question 27

Q

Features of NoSQL databases

Answer

A

Schema free
Support for non-relational data
Designed for horizontal scaling: automatic distribution
Auto-replication and caching

Question 28

Q

Types of noSQL dbs

Answer

A

Key-value database
Document oriented
Column family database
Graph database

Question 29

Q

Amazon dynamo

Answer

A

Key-value database
Optimized for small requests, quick access, high availability
Server less service
Fault tolerate
Automatic scaling of tables
Support for ACID transactions
Encryption by default
Fine grained access control for tables

Question 30

Q

DynamoDB

Answer

A

Decentralized architecture and eventual consistency semantics

Question 31

Q

Dynamo dB partitions

Answer

A

Tables are stored in partitions

Question 32

Q

Management of partitions

Answer

A

Mapping keys to partitions
Mapping partitions to nodes
Virtual nodes are assigned to physical nodes

Question 33

Q

Dynamo: Replication

Answer

A

Replication(N,R,W)
N consecutive nodes
If read successful on R copies, it is successful
Same for write on W copies
Typical configuration (3,2,2)
R+W>N ensures the most recent info is returned

Can be used to configure SLA requirements of the service
N determines durability
R and W latency

Question 34

Q

Dynamo DB FAILUREs

Answer

A

Gossip protocol