Data Flashcards
What is the data breakpoint where you should choose AWS SnowMobile vs a large number of AWS SnowBalls
10 PetaBytes
Do RDS Proxies work with Aurora?
Yes
What is Redshift Spectrum
Service that utilizes shared redshift servers in AWS to query / retrieve data from S3 without needing to load data into dedicated redshift tables.
What are the cost differences between S3, EFS, and EBS
At the time of this question, the following prices are true.
- Default s3 - $.023 / GB / Month
- EFS - $.3 / GB / Month
- EBS - $.1 / GB / Month
Note - s3 and EFS charge based on actual data stored. EBS storage must be pre-provisioned
What is the advantage of using a Parquet / ORC vs a CSV?
Parquets and ORCs are both columnar data, allowing for fast retrieval. CSV is row based, and therefore slower.
What is the Minimum storage duration charge for S3-IA
S3-Infrequent Access has a minimum storage duration charge of 30 days.
What is AWS Neptune
Neptune is a fully managed graph database. It is not an in-memory database.
When are you not charged for S3 Transfer Acceleration
When S3 Transfer Acceleration does not result in a faster data transfer
What is S3 Transfer Acceleration
S3 Transfer Acceleration is a service that speeds up transfers to / from S3. It can speed up transfers from 50-500%
What is the turnaround for AWS Snowball
5-7 Days
Can CloudFront help with upload and download speed to S3?
Upload Speed - No
Download Speed - If content is cached in CloudFront, download speed will increase. If content is not cached, there will not be an increase in download speed.
What Database solution allows writes to one table in multiple regions?
Dynamo DB Global Tables work across regions.
* Note - Aurora Global DBs allow reads across multiple regions, but not writes.
What is a total storage on a single AWS Snowball unit?
80 TB
What are the differences between Kinesis Data Stream and Kinesis Firehose
Kinesis Data Stream
- real time
- code not managed
- scaling not managed
- temporarily stores data
Kinesis Firehose
- near real time
- fully managed code
- fully managed scaling
- no data storage
What is the ideal AWS Service for one-time or periodic data transfers from on-premises storage to AWS?
AWS Datasync