AWS Data Storage Flashcards
AWS Data Sync
- Fast and simple way to sync existing FileSystems into
- AWS EFS
- AWS S3
- AWS FSx Windows Server
- Over Inet or Direct Connect
AWS Athena
- Queries data in S3 using SQL
- Can be connected to other data sources using Lambda
AWS Glue
- Fully Managed ETL Service
- Used to prepare data for analytics
Transfer Acceleration
- Used for S3
- Leveraged AWS Cloudfront edge locations
- Delivers fast, easy, and secure transfer of files
- Used to accelerate xfers over long distances
S3
- Object Storage with Unlimited space
- Universal Namespace
- Created in a Region
- Files from 0 up to 5 TBs
S3 as a Static Website
- HTTP connections only
Kinesis Data Firehose
- Captures, Transforms, and loads streaming data
- Data sent to it by producers
- Data is sent to other AWS Services from it
- Data can be transformed by Lambda
- Enables near real time analytics using BI tools
FSx File System Options
- Scratch
- Temp Storage, data not replicated, high burst rate
- Persistent
- Long term, data replicated
Storage Gateway - File Gateway
- Virtual On-Prem file server
- Stores and receives files in S3
- Used with on-prem apps that need files storage in s3
- used with ec2 apps that needs file storage in s3
- SBM or NFS
EFS - Elastic File System
- Fully managed NFS file system
- Mounts in one or many AZs
- Uses VPN or Direct Connect for On-prem Mounts
- Data stored across many AZs
- ## Scales to PBs
FSx for Lustre
- High performance File System for Fast processing
- ML, HPC, Video, Financial Models
- Unix based
- Natively with S3
- Objects are presented as files in the FS
Kinesis Data Analytics
- Real time sql processing for Streaming Data from
- Kinesis Data Streams
- Kinesis Fire Hose
- Sends to
- Kinesis Data Streams
- Kinesis Fire Hose
- Lambda
Kinesis Data Streams
- Real time processing of streaming data
- Rapidly moves data off producers
- Stores for 24hrs or up to 7 days
- Stores data for latter processing
Amazon EMR
- Managed service for Hadoop or Spark
- Commonly used for log analysis, financial analysis
- Used to ETL data for big data
Data Lifecycle Manager
- Automates the creation, retention, and deletions of EBS
- Snapshots and Volumes
- enforces regular backup schedule
- creates standardized AMIs
- retains backups for audit and compliance
Storage Gateway - Volume Gateway
- virtual appliance for block based storage
- Cached mode - Stored on S3 with cache of frequent data on site
- Stored mode - Stored on Site with async backup to S3
Storage Gateway
- Virtual appliance / machine on prem
- Enables hybrid storage between onprem and aws
- low latency with data cached on prem
- data stored securely and durably in AWS
- Local storage backed by S3 and Glacier
- Cloud migrations and DR prep
Storage Gateway - Tape Gateway
- Virtual appliance in support of Tape/VTL storage
- Netbackup, Backup Exe, Veeam
S3 Multi-part Upload
- Files over 100mb
- Can be used for files 5mb to 5tb
- improves throughput
S3 buckets
- private by default
- object ACLs make individual objects public
- Bucket Policies make the entire bucket public
S3 - Encryption - SSE
- Service Side Encryption
- AWS provides for us on our behalf.
- AWS manages
- Keys in AWS
S3 - Encryption - SSE - KMS
- Server side encryption with KMS
- Uses KMS system
- Keys are stored in AWS
S3 - Encryption - SSE - C
- Server side encryption - client managed
- Customer handles the keys
- Not stored in AWS
S3 - Encryption - Client side
- Encryption before uploading the object
S3 Performance
- Use multi-part on upload - multiple threads
- Use Folders - multiple consumers
- Use Byte Range - Split the file
S3 Replication
- Delete markers are not replicated by default
- existing objects are not replicated
- versioning must be enabled
Snowball
- 80TBs
Snowball Edge
- 100TBs with compute
Snowmobile
100 pbs
Can Fargate connect to FSx for Luster?
NO… must use EFS in this case.