DA Flashcards

1
Q

A near-real-time solution is needed that only collects non-confidential data from sensitive streaming data and stores it in durable storage.

A

Use Amazon Kinesis Data Firehose to ingest streaming data and enable record transformation to utilize AWS Lambda for excluding sensitive data. Store the processed data in Amazon S3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Large files are compressed into a single GZIP file and uploaded into an S3 bucket. You have to speed up the COPY process to load data into Amazon Redshift.

A

Split the GZIP file into smaller files and make sure that their number is a multiple of the number of the Redshift cluster’s slices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

An Amazon EMR cluster needs to use a centralized metadata layer that will expose data in Amazon S3 as tables.

A

AWS Glue Catalogue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Ways to fix Amazon Kinesis Data Streams throttling issues on write requests.

A

Increase the number of shards using the UpdateShardCount API command.

Use random partition keys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A company needs a cost-effective solution for detecting anomalous data coming from an Amazon Kinesis Data stream.

A

Create a Kinesis Data Analytics application and use the RANDOM_CUT_FOREST function for anomaly detection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A company wants a cost-effective solution that will enable them to query a subset of data from a CSV file.

A

Use Amazon S3 Select

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

You need to populate a data catalog using data stored in Amazon S3, Amazon RDS, and Amazon DynamoDB.

A

Use an AWS Glue crawler schedule

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A Data Analyst used the COPY command to migrate CSV files into a Redshift cluster. However, no data was imported and no errors were found after the process was finished.

A

The CSV files uses carriage returns as a line terminator.

The IGNOREHEADER parameter was included in the COPY command.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a cost-effective solution to save Redshift query results to an external storage?

A

Use the Amazon Redshift UNLOAD command

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A company is using Amazon S3 Standard-IA and Amazon S3 Glacier as its data storage.

Some data cannot be accessed with Amazon Athena queries. Which best explains this event?

A

Amazon Athena is trying to access data stored in Amazon S3 Glacier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A company uses an Amazon EMR cluster to process 10 batch jobs every day. Each job takes about 20 minutes to complete. A solution to lower down the cost of the EMR cluster must be implemented.

A

Use transient Amazon EMR clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

An Amazon Kinesis Client Library (KCL) application is processing data in a DynamoDB table that has provisioned write capacity. The application’s latency increases during peak times and it must be resolved immediately.

A

Increase the DynamoDB tables’ write throughput.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Thousands of files are being loaded in a central fact table hosted on Amazon Redshift. You need to optimize the cluster resource utilization when loading data into the fact table.

A

Use a single COPY command to load data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A Lambda function is used to process data from a Kinesis Data stream. Results are delivered into Amazon ES. During peak hours, the processing time slows down.

A

Use multiple Lambda function to prococess data concurrently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A Data Analyst needs to join data stored in Amazon Redshift and data stored in Amazon S3. The Analyst wants a serverless solution that will reduce the workload of the Redshift cluster.

A

Create an external table using Amazon Redshift Spectrum for the S3 data and use Redshift SQL queries for join operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A company requires an out-of-the-box solution for visualizing complex real-world scenarios and forecasting trends.

A

Use ML-powered forecasting with Amazon QuickSight

17
Q

A Data Analyst needs to use Amazon QuickSight for creating daily reports based on the dataset stored in Amazon S3.

A

Create a daily schedule refresh for the dataset.

18
Q

A company has encountered an import into SPICE error after using Amazon QuickSight to query a new Amazon Athena table that is associated with a new S3 bucket.

A

Configure the correct permissions for the new S3 bucket from the QuickSight Console.

19
Q

A company needs a cost-effetive solution for ad-hoc analyses and data visualizations.

A

Use Amazon Athena and Amazon QuickSight.

20
Q

A company needs to visualize and analyze web logs in near-real time.

A

Use Amazon Kinesis Data Firehose to stream logs into Amazon Elasticsearch. Visualize logs using Kibana.

21
Q

Root device volume encryption must be enabled on all nodes of an EMR cluster. AWS CloudFormation is required for creating new resources.

A

Create a custom AMI with encrypted root device volume and place the AMI ID under the CustomAmild property within the CloudFormation template.

22
Q

A solution is needed to encrypt data stored in an EBS volume that is attached to an EMR cluster

A

Use Linux Unified Key Setup (LUKS).

23
Q

A company is having trouble accessing data in a Redshift cluster using Amazon QuickSight.

A

Create a new inbound rule for the cluster’s security group that allows access from the IP address range that Amazon QuickSight uses.

24
Q

A company wants to prevent any user from creating EMR clusters that is accessible from the public Internet.

A

Enable the ‘block public access’ setting in the Amazon EMR Console.

25
Q

A company wants data in a Kinesis Data stream to be encrypted. The company wants to manage the key rotation.

A

Specify a Customer Master Key when enabling server-side encryption for the Kinesis Data stream.