AWS Data Storage Flashcards
AWS Data Sync
- Fast and simple way to sync existing FileSystems into
- AWS EFS
- AWS S3
- AWS FSx Windows Server
- Over Inet or Direct Connect
AWS Athena
- Queries data in S3 using SQL
- Can be connected to other data sources using Lambda
AWS Glue
- Fully Managed ETL Service
- Used to prepare data for analytics
Transfer Acceleration
- Used for S3
- Leveraged AWS Cloudfront edge locations
- Delivers fast, easy, and secure transfer of files
- Used to accelerate xfers over long distances
S3
- Object Storage with Unlimited space
- Universal Namespace
- Created in a Region
- Files from 0 up to 5 TBs
S3 as a Static Website
- HTTP connections only
Kinesis Data Firehose
- Captures, Transforms, and loads streaming data
- Data sent to it by producers
- Data is sent to other AWS Services from it
- Data can be transformed by Lambda
- Enables near real time analytics using BI tools
FSx File System Options
- Scratch
- Temp Storage, data not replicated, high burst rate
- Persistent
- Long term, data replicated
Storage Gateway - File Gateway
- Virtual On-Prem file server
- Stores and receives files in S3
- Used with on-prem apps that need files storage in s3
- used with ec2 apps that needs file storage in s3
- SBM or NFS
EFS - Elastic File System
- Fully managed NFS file system
- Mounts in one or many AZs
- Uses VPN or Direct Connect for On-prem Mounts
- Data stored across many AZs
- ## Scales to PBs
FSx for Lustre
- High performance File System for Fast processing
- ML, HPC, Video, Financial Models
- Unix based
- Natively with S3
- Objects are presented as files in the FS
Kinesis Data Analytics
- Real time sql processing for Streaming Data from
- Kinesis Data Streams
- Kinesis Fire Hose
- Sends to
- Kinesis Data Streams
- Kinesis Fire Hose
- Lambda
Kinesis Data Streams
- Real time processing of streaming data
- Rapidly moves data off producers
- Stores for 24hrs or up to 7 days
- Stores data for latter processing
Amazon EMR
- Managed service for Hadoop or Spark
- Commonly used for log analysis, financial analysis
- Used to ETL data for big data
Data Lifecycle Manager
- Automates the creation, retention, and deletions of EBS
- Snapshots and Volumes
- enforces regular backup schedule
- creates standardized AMIs
- retains backups for audit and compliance
Storage Gateway - Volume Gateway
- virtual appliance for block based storage
- Cached mode - Stored on S3 with cache of frequent data on site
- Stored mode - Stored on Site with async backup to S3
Storage Gateway
- Virtual appliance / machine on prem
- Enables hybrid storage between onprem and aws
- low latency with data cached on prem
- data stored securely and durably in AWS
- Local storage backed by S3 and Glacier
- Cloud migrations and DR prep
Storage Gateway - Tape Gateway
- Virtual appliance in support of Tape/VTL storage
- Netbackup, Backup Exe, Veeam
S3 Multi-part Upload
- Files over 100mb
- Can be used for files 5mb to 5tb
- improves throughput
S3 buckets
- private by default
- object ACLs make individual objects public
- Bucket Policies make the entire bucket public
S3 - Encryption - SSE
- Service Side Encryption
- AWS provides for us on our behalf.
- AWS manages
- Keys in AWS
S3 - Encryption - SSE - KMS
- Server side encryption with KMS
- Uses KMS system
- Keys are stored in AWS
S3 - Encryption - SSE - C
- Server side encryption - client managed
- Customer handles the keys
- Not stored in AWS
S3 - Encryption - Client side
- Encryption before uploading the object
S3 Performance
- Use multi-part on upload - multiple threads
- Use Folders - multiple consumers
- Use Byte Range - Split the file
S3 Replication
- Delete markers are not replicated by default
- existing objects are not replicated
- versioning must be enabled
Snowball
- 80TBs
Snowball Edge
- 100TBs with compute
Snowmobile
100 pbs
Can Fargate connect to FSx for Luster?
NO… must use EFS in this case.
Can a linux instance connect to FSx for Windows File Server
Yes… using the cifs-utils package linux can mount an SMB/CIFS share
RAID used for I/O … allows for increased IOPs
RAID 0 - Stripped
RAID used for durability
RAID 1 - Mirrored
What is the magic number to use io1 vs gp2
16000 IOPS
What NAS works with MS Active Directory
FXs Window File Server
Does FSx support multi-AZ
Yes
EFS Storage Classes
Standard - multiple AZs
Standard IA
OneZone - Redundant in a single AZ - 47% less
OneZone - IA
EFS LifeCycle parameter used
age-off policy for files
Used to move files automatically between EFS storage classes based on access
EFS Intelligent-Tiering
How can you restrict direct access to an s3 bucket so that only a website can access the data.
Bucket policy and allowing referrals from the website url… Should not hardcode the IPS of the EC2 instances running the website
How can you store a backup of an EBS volume on S3
Take a snap… snaps are stored on S3
Encryption is supported on all ebs volume types? True or false
True
Can you have both encrypted and non-encrypted volumes on an instance
Yes
S3 object lock mode where users can’t overwrite or delete an object version or alter its lock settings unless they have special permissions.
Governance mode
S3 object lock mode where a protected object version can’t be overwritten or deleted by any user, including the root user in your AWS account.
Compliance mode
allows you to easily deploy and enforce compliance controls for individual S3 Glacier vaults with a vault lock policy.
S3 Glacier vault lock
S3 Object Lock is enabled when?
the s3 object locks can be only enabled on bucket creation
Can you use S3 Object Lock and lifecycle policies?
Yes…. S3 Object Lock protection is maintained regardless of which storage class the object resides in and throughout S3 Lifecycle transitions between storage classes.
S3 Glacier Deep Archive minimum storage duration period
180 days
S3 Glacier Flexible Retrieval minimum storage duration period
90 days
S3 Transitions
S3 Standard storage class –> other storage class.
Any storage class –> S3 Glacier or S3 Glacier Deep Archive storage classes.
S3 Standard-IA –> S3 Intelligent-Tiering or S3 One Zone-IA
S3 Intelligent-Tiering storage class –> S3 One Zone-IA storage class.
S3 Glacier storage class –> S3 Glacier Deep Archive storage class.
Object size that requires multi-part upload?
5gb
S3 Object size multi-part upload recommended
100mb
Can you use S3 Object lock with Glacier?
Yes. S3 Object Lock is a new feature that prevents data from being deleted during a customer-defined retention period. You can use Object Lock with any S3 storage class, including S3 Glacier.
What can you use for added security for EFS connections?
Add a rule to the mount target security group to allow inbound access from the EC2 security group
Uses simple SQL expressions to query S3 for analysis
Amazon S3 Select
- Select is a lightweight solution designed to let you use SQL to perform simple SELECT clauses on a maximum of one file. Amazon Athena is an analytics workhorse that allows you to perform SQL on extremely large datasets spanning many files with great performance
Can S3 Transfer Acceleration be used for Downloads (GETS)
No… its for Uploads
How to restrict S3 access to folders by folder name
Create IAM policies for folder level permissions
Create Groups and attach the policeis
a storage solution that can scale as data volumes increase with the LEAST amount of management and configuration for EC2 instances…
EFS… EBS isn’t a managed service.
S3 is strongly consistent for all get, our, and list ops
True
EFS mode for high frequency reads and writes
Provisioned throughput mide
EFS mode recommend for most applications
Bursting throughput mode
EFS mode that scales throughput based on spikes
Bursting
Which s3 supports encryption by default for both data at rest and in flight
S3 Glacier
allow you to quickly access your data when occasional urgent requests for a subset of archives are required. For all but the largest archives (250 MB+), data accessed using Expedited retrievals are typically made available within 1–5 minutes. Provisioned Capacity ensures that retrieval capacity for Expedited retrievals is available when you need it.
Glacier Expedited retrievals
allow you to access any of your archives within several hours…..typically complete within 3–5 hours. This is the default option for retrieval requests that do not specify the retrieval option.
Glacier Standard retrievals
are S3 Glacier’s lowest-cost retrieval option, which you can use to retrieve large amounts, even petabytes, of data inexpensively in a day typically complete within 5–12 hours.
Glacier Bulk retrievals
Can an efs be accessed from another region
Yes via inter-region vpc peering
Io2 block express up to ____ iops
256k
Are there transfer charges for s3 from the inet
No
S3 standard has what min duration charge
None
EFS performance mode
Max I/O
S3 glacier flexible retrieval times
Mins to hours
S3 glacier instant retrieval times
Milliseconds
S3 glacier deep archive retrieval times
Hours