Data Flashcards

1
Q

What is the data breakpoint where you should choose AWS SnowMobile vs a large number of AWS SnowBalls

A

10 PetaBytes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Do RDS Proxies work with Aurora?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Redshift Spectrum

A

Service that utilizes shared redshift servers in AWS to query / retrieve data from S3 without needing to load data into dedicated redshift tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the cost differences between S3, EFS, and EBS

A

At the time of this question, the following prices are true.
- Default s3 - $.023 / GB / Month
- EFS - $.3 / GB / Month
- EBS - $.1 / GB / Month
Note - s3 and EFS charge based on actual data stored. EBS storage must be pre-provisioned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the advantage of using a Parquet / ORC vs a CSV?

A

Parquets and ORCs are both columnar data, allowing for fast retrieval. CSV is row based, and therefore slower.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the Minimum storage duration charge for S3-IA

A

S3-Infrequent Access has a minimum storage duration charge of 30 days.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is AWS Neptune

A

Neptune is a fully managed graph database. It is not an in-memory database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When are you not charged for S3 Transfer Acceleration

A

When S3 Transfer Acceleration does not result in a faster data transfer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is S3 Transfer Acceleration

A

S3 Transfer Acceleration is a service that speeds up transfers to / from S3. It can speed up transfers from 50-500%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the turnaround for AWS Snowball

A

5-7 Days

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Can CloudFront help with upload and download speed to S3?

A

Upload Speed - No
Download Speed - If content is cached in CloudFront, download speed will increase. If content is not cached, there will not be an increase in download speed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What Database solution allows writes to one table in multiple regions?

A

Dynamo DB Global Tables work across regions.
* Note - Aurora Global DBs allow reads across multiple regions, but not writes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a total storage on a single AWS Snowball unit?

A

80 TB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the differences between Kinesis Data Stream and Kinesis Firehose

A

Kinesis Data Stream
- real time
- code not managed
- scaling not managed
- temporarily stores data

Kinesis Firehose
- near real time
- fully managed code
- fully managed scaling
- no data storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the ideal AWS Service for one-time or periodic data transfers from on-premises storage to AWS?

A

AWS Datasync

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What file size should S3 multipart uploads be considered?

A

100 MB

17
Q

For Redshift cross region replication, which region should the KMS key be? What region should the redshift snapshots be?

A

The KMS key should be in the target region.
The Redshift snapshots should be in the source region

18
Q

True / False - Kinesis Firehose can handle multiple delivery stream sources at the same time?

A

False. Each Firehose can only handle a single data source at a time

19
Q

How does one transfer data from AWS Snow Family to S3 Glacier?

A

AWS Snow Family -> S3 with lifecycle policy -> Glacier

20
Q

What are the four types of logs for Aurora? How are they configured?

A
  1. Error Logs - enabled by default
  2. Slow Query Logs - Enabled via parameter in db parameter group
  3. General Logs - Enabled via parameter in db parameter group
  4. Audit Logs - Also known as advanced audit logs, enabled in the db cluster
21
Q

What is Apache Hive Partitioning?

A

Partition strategy that splits data into multiple parts / files based on partition key