DA Flashcards

Question 1

Q

A near-real-time solution is needed that only collects non-confidential data from sensitive streaming data and stores it in durable storage.

Answer

A

Use Amazon Kinesis Data Firehose to ingest streaming data and enable record transformation to utilize AWS Lambda for excluding sensitive data. Store the processed data in Amazon S3.

Question 2

Q

Large files are compressed into a single GZIP file and uploaded into an S3 bucket. You have to speed up the COPY process to load data into Amazon Redshift.

Answer

A

Split the GZIP file into smaller files and make sure that their number is a multiple of the number of the Redshift cluster’s slices.

Question 3

Q

An Amazon EMR cluster needs to use a centralized metadata layer that will expose data in Amazon S3 as tables.

Answer

A

AWS Glue Catalogue

Question 4

Q

Ways to fix Amazon Kinesis Data Streams throttling issues on write requests.

Answer

A

Increase the number of shards using the UpdateShardCount API command.

Use random partition keys

Question 5

Q

A company needs a cost-effective solution for detecting anomalous data coming from an Amazon Kinesis Data stream.

Answer

A

Create a Kinesis Data Analytics application and use the RANDOM_CUT_FOREST function for anomaly detection.

Question 6

Q

A company wants a cost-effective solution that will enable them to query a subset of data from a CSV file.

Answer

A

Use Amazon S3 Select

Question 7

Q

You need to populate a data catalog using data stored in Amazon S3, Amazon RDS, and Amazon DynamoDB.

Answer

A

Use an AWS Glue crawler schedule

Question 8

Q

A Data Analyst used the COPY command to migrate CSV files into a Redshift cluster. However, no data was imported and no errors were found after the process was finished.

Answer

A

The CSV files uses carriage returns as a line terminator.

The IGNOREHEADER parameter was included in the COPY command.

Question 9

Q

What is a cost-effective solution to save Redshift query results to an external storage?

Answer

A

Use the Amazon Redshift UNLOAD command

Question 10

Q

A company is using Amazon S3 Standard-IA and Amazon S3 Glacier as its data storage.

Some data cannot be accessed with Amazon Athena queries. Which best explains this event?

Answer

A

Amazon Athena is trying to access data stored in Amazon S3 Glacier.

Question 11

Q

A company uses an Amazon EMR cluster to process 10 batch jobs every day. Each job takes about 20 minutes to complete. A solution to lower down the cost of the EMR cluster must be implemented.

Answer

A

Use transient Amazon EMR clusters

Question 12

Q

An Amazon Kinesis Client Library (KCL) application is processing data in a DynamoDB table that has provisioned write capacity. The application’s latency increases during peak times and it must be resolved immediately.

Answer

A

Increase the DynamoDB tables’ write throughput.

Question 13

Q

Thousands of files are being loaded in a central fact table hosted on Amazon Redshift. You need to optimize the cluster resource utilization when loading data into the fact table.

Answer

A

Use a single COPY command to load data.

Question 14

Q

A Lambda function is used to process data from a Kinesis Data stream. Results are delivered into Amazon ES. During peak hours, the processing time slows down.

Answer

A

Use multiple Lambda function to prococess data concurrently.

Question 15

Q

A Data Analyst needs to join data stored in Amazon Redshift and data stored in Amazon S3. The Analyst wants a serverless solution that will reduce the workload of the Redshift cluster.

Answer

A

Create an external table using Amazon Redshift Spectrum for the S3 data and use Redshift SQL queries for join operations.

Question 16

Q

A company requires an out-of-the-box solution for visualizing complex real-world scenarios and forecasting trends.

Answer

A

Use ML-powered forecasting with Amazon QuickSight

Question 17

Q

A Data Analyst needs to use Amazon QuickSight for creating daily reports based on the dataset stored in Amazon S3.

Answer

A

Create a daily schedule refresh for the dataset.

Question 18

Q

A company has encountered an import into SPICE error after using Amazon QuickSight to query a new Amazon Athena table that is associated with a new S3 bucket.

Answer

A

Configure the correct permissions for the new S3 bucket from the QuickSight Console.

Question 19

Q

A company needs a cost-effetive solution for ad-hoc analyses and data visualizations.

Answer

A

Use Amazon Athena and Amazon QuickSight.

Question 20

Q

A company needs to visualize and analyze web logs in near-real time.

Answer

A

Use Amazon Kinesis Data Firehose to stream logs into Amazon Elasticsearch. Visualize logs using Kibana.

Question 21

Q

Root device volume encryption must be enabled on all nodes of an EMR cluster. AWS CloudFormation is required for creating new resources.

Answer

A

Create a custom AMI with encrypted root device volume and place the AMI ID under the CustomAmild property within the CloudFormation template.

Question 22

Q

A solution is needed to encrypt data stored in an EBS volume that is attached to an EMR cluster

Answer

A

Use Linux Unified Key Setup (LUKS).

Question 23

Q

A company is having trouble accessing data in a Redshift cluster using Amazon QuickSight.

Answer

A

Create a new inbound rule for the cluster’s security group that allows access from the IP address range that Amazon QuickSight uses.

Question 24

Q

A company wants to prevent any user from creating EMR clusters that is accessible from the public Internet.

Answer

A

Enable the ‘block public access’ setting in the Amazon EMR Console.

Question 25

Q

A company wants data in a Kinesis Data stream to be encrypted. The company wants to manage the key rotation.

Answer

A

Specify a Customer Master Key when enabling server-side encryption for the Kinesis Data stream.