Storage Flashcards

1
Q

Your big data application is taking a lot of files from your local on-premise NFS storage and inserting them into S3. As part of the data integrity verification process, the application downloads the files right after they’ve been uploaded. What will happen?

A

The application will receive a 200 as S3 for new PUT is strongly consistent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

You are gathering various files from providers and plan on analyzing them once every month using Athena, which must return the query results immediately. You do not want to run a high risk of losing files and want to minimise costs. Which storage type do you recommend?

A

S3 Infrequent Access

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

As part of your compliance as a bank, you must archive all logs created by all applications and ensure they cannot be modified or deleted for at least 7 years. Which solution should you use?

A

Glacier with a Vault Lock Policy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

You are generating thumbnails in S3 from images. Images are in the images/ directory while thumbnails in the thumbnails/ directory. After running some analytics, you realized that images are rarely read and you could optimise your costs by moving them to another S3 storage tiers. What do you recommend that requires the least amount of changes?

A

Create a Lifecycle Rule for the images/prefix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In order to perform fast big data analytics, it has been recommended by your analysts in Japan to continuously copy data from your S3 bucket in us-east-1. How do you recommend doing this at a minimal cost?

A

Enable Cross Region Replication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Your big data application is taking a lot of files from your local on-premise NFS storage and inserting them into S3. As part of the data integrity verification process, you would like to ensure the files have been properly uploaded at minimal cost. How do you proceed?

A

Compute the local ETag for each file and compare them with AWS S3’s ETag

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Your application plans to have 15,000 reads and writes per second to S3 from thousands of device ids. Which naming convention do you recommend?

A

/yyyy-mm-dd/… (you get about 3k reads per second per prefix, so using the device-id will help having many prefixes and parallelize your writes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

You are looking to have your files encrypted in S3 and do not want to manage the encryption yourself. You would like to have control over the encryption keys and ensure they’re securely stored in AWS. What encryption do you recommend?

A

SSE-KMS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Your website is deployed and sources its images from an S3 bucket. Everything works fine on the internet, but when you start the website locally to do some development, the images are not getting loaded. What’s the problem?

A

S3 CORS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What’s the maximum number of fields that can make a primary key in DynamoDB?

A

2 (partition key + sort key)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What’s the maximum size of a row in DynamoDB ?

A

400 KB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

You are writing item of 8 KB in size at the rate of 12 per seconds. What WCU do you need?

A

96 (8x12)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

You are doing strongly consistent read of 10 KB items at the rate of 10 per second. What RCU do you need?

A

30 (10 KB gets rounded to 12 KB, divided by 4KB = 3, times 10 per second = 30)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

You are doing 12 eventually consistent reads per second, and each item has a size of 16 KB. What RCU do you need?

A

24 (we can do 2 eventually consistent reads per seconds for items of 4 KB with 1 RCU)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

We are getting a ProvisionedThroughputExceededExceptions but after checking the metrics, we see we haven’t exceeded the total RCU we had provisioned. What happened?

A

We have a hot partition / hot key (remember RCU and WCU are spread across all partitions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

You are about to enter the Christmas sale and you know a few items in your website are very popular and will be read often. Last year you had a ProvisionedThroughputExceededException. What should you do this year?

A

Create a DAX cluster

17
Q

You would like to react in real-time to users de-activating their account and send them an email to try to bring them back. The best way of doing it is to…

A

Integrate Lambda with a DynamoDB stream

18
Q

You would like to have DynamoDB automatically delete old data for you. What should you use?

A

Use TTL

19
Q

You are looking to improve the performance of your RDS database by caching some of the most common rows and queries. Which technology do you recommend?

A

ElastiCache

20
Q

Which operation/feature or service would you use to locate all items in a table with a particular sort key value? (Choose 2)

  • GetItem
  • Query with a local secondary index
  • Scan against a table, with filters
  • Query with a global secondary index
  • Query
A
  • Scan against a table, with filters
  • Query with a global secondary index

(Local secondary indexes can’t be used: they only allow an alternative sort key, and query can only work against 1 partition key, with a single or range of sort. Global secondary indexes will allow a new index with the sort key as a partition key, and query will work. Scan will allow it, but is very inefficient. GetItem wont work: it needs a single P-KEY and S-KEY.)

21
Q

You have an application based on the Amazon Kinesis Streams API, and you are not using the Kinesis Produce Library as part of your application. While you won’t be taking advantage of all the benefits of the KPL in your application, you still need to ensure that you add data to a stream efficiently. Which API operation allows you to do this?

  • PutItems
  • PutRecord
  • PutItem
  • PutRecords
A

PutRecords

(The PutRecords operation writes multiple data records into an Amazon Kinesis stream in a single call. Use this operation to send data into the stream for data ingestion and processing.)

22
Q

What are the max deliverables from one Dynamo DB Partition.

A

1,000 WCU, 3,000RCU, 10GB Data volume

(DynamoDB is capable of delivering 1,000 WCU, 3,000 RCU and 10GB of data from a single partition – any more causes additional partitions to be created and data split between them. Further information: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.Partitions)

23
Q

Which of the following statements is true?

  • A shard supports up to 1000 transactions per second for reads, and 5 transactions per second for writes.
  • A shard supports up to 5 transactions per second for reads, and 10 records per second for writes.
  • A shard supports up to 5 transactions per second for reads, and 100 records per second for writes.
  • A shard supports up to 5 transactions per second for reads, and 1000 records per second for writes.
A

A shard supports up to 5 transactions per second for reads, and 1000 records per second for writes.

(Each shard can support up to 5 transactions per second for reads, and up to 1,000 records per second for writes.)

24
Q

The Kinesis Connector Library allows you to emit data from a stream to various AWS services. Which of the following services can receive data emitted from such a stream? (Choose 4)

  • DynamoDB
  • S3
  • Elasticsearch
  • Redshift
  • Lambda
  • RDS
A
  • DynamoDB
  • S3
  • Elasticsearch
  • Redshift

(The Kinesis Connector Library includes implementations for use with Amazon DynamoDB, Amazon Redshift, Amazon S3, and Elasticsearch. If you want to use Lambda with Kinesis Streams, you need to create Lambda functions to automatically read batches of records off your Amazon Kinesis stream and process them if records are detected on the stream. AWS Lambda then polls the stream periodically (once per second) for new records.)

25
Q

Which of the following attribute data types can be table or item keys? (Choose 3)

  • String
  • Blob
  • Binary
  • Map
  • Number
A

String
Binary
Number

String, Number, and Binary data types (scalars) can be table or item keys. Further information: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html

26
Q

True or False: You can add a local secondary index to a DynamoDB table after it has been created.

A

False

(You cannot add a local secondary index to a DynamoDB table after it has been created. Secondary indexes must be created at the time of table creation.)

27
Q

A producer application has been designed to write thousands of events per second to Kinesis Streams by integrating the Kinesis Producer library into the application. The application takes data from logs on EC2 instances and ingests the data into Streams records. Which of the following solutions did the developer use to improve throughput when implementing the KPL with the application?

  • Aggregation
  • De-Aggregation
  • Re-Aggregation
  • Collection
A

Aggregation

(Aggregation refers to the storage of multiple records in a Streams record. Aggregation allows customers to increase the number of records sent per API call, which effectively increases producer throughput. Further information: https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html#w2ab1c11b9c11c23c13)

28
Q

Which of the following must be defined when you create a table? (Choose 4)

  • The Table Name
  • The DCU (Delete/Update Capacity Units)
  • The table capacity, number of GB.
  • The RCU (Read Capacity Units)
  • Partition Key
  • The WCU (Write Capacity Units)
A
  • Partition Key
  • The RCU (Read Capacity Units)
  • The table capacity, number of GB.
  • The WCU (Write Capacity Units)
29
Q

True or False: Kinesis streams are appropriate for persistent storage of your streaming data.

A

False

Kinesis stream data is, by default, only stored for 24 hours. However, this timeframe can be extended to 7 days.

30
Q

Your company has a number of consumer applications to get records from various Kinesis Streams for different use cases. For each consumer application there is a separate DynamoDB table that maintains application state. Out of the many consumer applications, one application is experiencing provisioned throughput exception errors with its particular DynamoDB table. Why is this happening? (Choose 2)

  • The stream does not have enough shards.
  • The stream has too many shards.
  • The application is not checkpointing enough.
  • The application is checkpointing too frequently.
A
  • The stream has too many shards.
  • The application is checkpointing too frequently.

(If your Amazon Kinesis Streams application receives provisioned-throughput exceptions, you should increase the provisioned throughput for the DynamoDB table. The KCL creates the table with a provisioned throughput of 10 reads per second and 10 writes per second, but this might not be sufficient for your application. For example, if your Amazon Kinesis Streams application does frequent checkpointing or operates on a stream that is composed of many shards, you might need more throughput. Further information: http://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-ddb.html)

31
Q

In terms of data read-rate for data output, what is the capacity of a shard in a Kinesis stream?

A

2 MB/s

Each shard in a Kinesis stream can support a maximum total data read rate of 2 MB per second.

32
Q

True or False: With both local secondary indexes and global secondary indexes, you can define read capacity units and write capacity units on the index itself — so that you don’t have to consume them from the base table.

A

False

(A global secondary index has its own provisioned throughput settings for read and write activity. Queries or scans on a local secondary index consume read capacity units from the base table. When you write to a table, its local secondary indexes are also updated; these updates consume write capacity units from the base table. Further information: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html)

33
Q

In terms of data write-rate for data input, what is the capacity of a shard in a Kinesis stream?

A

1 MB/s

Each shard in a Kinesis stream can support a maximum total data write rate of 1 MB per second.

34
Q

The critical aspect of this question is runningthe COPY command with the maximumamount of parallelism. The two options that will increase parallelism are B and D. Option D will load one file per node in parallel, which will increase performance, but option B will have a greater effectbecause it will allowAmazonRedshift to load multiple files per instance in parallel (COPY can process one file per slice on each node).Compressing the files (option A) is a recommended practice and will also increase performance,but not to the same extent as loading multiple files in parallel.

A

Split the file into 500 smaller files.

(The critical aspect of this question is runningthe COPY command with the maximumamount of parallelism. The two options that will increase parallelism are B and D. Option D will load one file per node in parallel, which will increase performance, but option B will have a greater effectbecause it will allowAmazonRedshift to load multiple files per instance in parallel (COPY can process one file per slice on each node).Compressing the files (option A) is a recommended practice and will also increase performance,but not to the same extent as loading multiple files in parallel.)

35
Q

A customer needs to load a 550-GB data file into an Amazon Redshift cluster from Amazon S3,usingthe COPY command. The input file has both known and unknown issues that will probably cause the load process to fail. The customer needs the most efficient way to detect load errors without performing any cleanup if the load process fails.Which technique should the customer use?

  • Split the input file into 50-GB blocks and load them separately.
  • Use COPY with NOLOAD parameter.
  • Write a script to delete the data from the tables in case of errors.
  • Compress the input file before running COPY.
A

Use COPY with NOLOAD parameter.

(From the AWS Documentation for NOLOAD:NOLOAD checks the integrity of all of the data without loading it into the database. The NOLOAD option displays any errors that would occur if you had attempted to load the data.All other options will require subsequent processing on the cluster which will consume resources.)

36
Q

An organization needs a data store to handle the following data types and access patterns:
-Key-value access pattern
-Complex SQL queries and transactions
-Consistent reads
-Fixed schema
Which data store should the organization choose?

  • Amazon S3
  • Amazon Kinesis
  • Amazon DynamoDB
  • Amazon RDS
A

-Amazon RDS

(Amazon RDS handles all these requirements, and althoughAmazon RDS is not typically thought of as optimized for key-value based access, a schema with a good primary key selection canprovide this functionality. Amazon S3 provides no fixed schema and does not have consistent read after PUT support. Amazon Kinesis supports streaming data thatis consistent as of a given sequence number butdoesn’t provide key/value access. Finally, althoughAmazon DynamoDB provides key/value access and consistent reads, itdoes notsupport SQL-based queries.)

37
Q

A company logs data from its application in large files and runs regular analytics of these logs to support internal reporting for three months after the logs are generated. After three months, the logs are infrequently accessed for up to a year. The company also has a regulatory control requirement to store application logs for seven years.Which course of action should the company take to achieve these requirements in the most cost-efficient way?

  • Store the files in S3 Glacier with a Deny Delete vault lock policy for archives less than seven years old and a vault access policy that restricts read access to the analytics IAM group and write access to the log writer service role.
  • Store the files in S3 Standard with a lifecycle policy to transition the storage class to Standard -IA after three months. After a year, transition the files to Glacier and add a Deny Delete vault lock policy for archives less than seven years old.
  • Store the files in S3 Standard with life cycle policies to transition the storage class to Standard –IA after three months and delete them after a year. Simultaneously store the files in Amazon Glacier with a Deny Delete vault lock policy for archives less than seven years old.
  • Store the files in S3 Standard with a lifecycle policy to remove them after a year. Simultaneously store the files in Amazon S3 Glacier with a Deny Delete vault lock policy for archives less than seven years old.
A

-Store the files in S3 Standard with life cycle policies to transition the storage class to Standard –IA after three months and delete them after a year. Simultaneously store the files in Amazon Glacier with a Deny Delete vault lock policy for archives less than seven years old.

(There are two aspects to this question:setting up a lifecycle policy to ensure that objects are stored in the most cost-effective storage, and ensuring that the regulatory control is met. The lifecycle policy will store the objects on S3 Standard during the three months of active use, and then move the objects to S3 Standard –IA when access will be infrequent. That narrows the possible answer set to B and C. The Deny Delete vault lock policy will ensure that the regulatory policy is met,but that policy must be applied over the entire lifecycle of the object, not just after it is moved to Glacier after the first year. Option C has the Deny Delete vault lock applied over the entire lifecycle of the objectand is the right answer.)