Repaso Flashcards
A financial services company runs its flagship web application on AWS. The application serves thousands of users during peak hours. The company needs a scalable near-real-time solution to share hundreds of thousands of financial transactions with multiple internal applications. The solution should also remove sensitive details from the transactions before storing the cleansed transactions in a document database for low-latency retrieval.
Which of the following would you recommend?
A) Batch process the raw transactions data into Amazon S3 flat files. Use S3 events to trigger an AWS Lambda function to remove sensitive data from the raw transactions in the flat file and then store the cleansed transactions in Amazon DynamoDB. Leverage DynamoDB Streams to share the transaction data with the internal applications
B) Persist the raw transactions into Amazon DynamoDB. Configure a rule in Amazon DynamoDB to update the transaction by removing sensitive data whenever any new raw transaction is written. Leverage Amazon DynamoDB Streams to share the transaction data with the internal applications
C) Feed the streaming transactions into Amazon Kinesis Data Streams. Leverage AWS Lambda integration to remove sensitive data from every transaction and then store the cleansed transactions in Amazon DynamoDB. The internal applications can consume the raw transactions off the Amazon Kinesis Data Stream
D) Feed the streaming transactions into Amazon Kinesis Data Firehose. Leverage AWS Lambda integration to remove sensitive data from every transaction and then store the cleansed transactions in Amazon DynamoDB. The internal applications can consume the raw transactions off the Amazon Kinesis Data Firehose
C
A university has tie-ups with local hospitals to share anonymized health statistics of people. The data is stored in Amazon S3 as .csv files. Amazon Athena is used to run extensive analytics on the data for finding correlations between different parameters in the data. The university is facing high costs and performance-related issues as the volume of data is growing rapidly. The data in the S3 bucket is already partitioned by date and the university does not want to change this partition scheme.
As a data engineer, how can you further improve query performance? (Select two)
A) Transform .csv files to Parquet format by fetching only the data fields required for predicates
B) The S3 bucket should be configured in the same AWS Region where the Athena queries are being run
C) Transform .csv files to JSON format by fetching the required key-value pairs only
D) Remove partitions and perform data bucketing on the S3 bucket
E) The S3 bucket should be configured in the same Availability Zone where the Athena queries are being run
AB
A CRM company has a software as a service (SaaS) application that feeds updates to other in-house and third-party applications. The SaaS application and the in-house applications are being migrated to use AWS services for this inter-application communication.
Which of the following would you suggest to asynchronously decouple the architecture?
A) Use Elastic Load Balancing (ELB) for effective decoupling of system architecture
B) Use Amazon Simple Notification Service (Amazon SNS) to communicate between systems and decouple the architecture
C) Use Amazon Simple Queue Service (Amazon SQS) to decouple the architecture
D) Use Amazon EventBridge to decouple the system architecture
D
The data engineering team at a logistics company leverages AWS Cloud to process Internet of Things (IoT) sensor data from the field devices of the company. The team stores the sensor data in Amazon DynamoDB tables. To detect anomalous behaviors and respond quickly, all changes to the items stored in the DynamoDB tables must be logged in near real-time.
As an AWS Certified Data Engineer Associate, which of the following solutions would you suggest to meet the requirements of the given use case so that it requires minimal custom development and infrastructure maintenance?
A) Set up DynamoDB Streams to capture and send updates to a Lambda function that outputs records directly to Kinesis Data Analytics (KDA). Detect and analyze anomalies in KDA and send notifications via SNS
B) Set up CloudTrail to capture all API calls that update the DynamoDB tables. Leverage CloudTrail event filtering to analyze anomalous behaviors and send SNS notifications in case anomalies are detected
C) Set up DynamoDB Streams to capture and send updates to a Lambda function that outputs records to Kinesis Data Analytics (KDA) via Kinesis Data Streams (KDS). Detect and analyze anomalies in KDA and send notifications via SNS
D) Configure event patterns in CloudWatch Events to capture DynamoDB API call events and set up Lambda function as a target to analyze anomalous behavior. Send SNS notifications when anomalous behaviors are detected
C
A healthcare company has recently migrated to Amazon Redshift. The technology team at the company is now working on the Disaster Recovery (DR) plans for the Redshift cluster deployed in the eu-west-1 Region. The existing cluster is encrypted via AWS KMS and the team wants to copy the Redshift snapshots to another Region to meet the DR requirements.
Which of the following solutions would you recommend to meet the given requirements?
A) Create a snapshot copy grant in the destination Region for a KMS key in the destination Region. Configure Redshift cross-Region snapshots in the source Region
B) Create an IAM role in the destination Region with access to the KMS key in the source Region. Create a snapshot copy grant in the destination Region for this KMS key in the source Region. Configure Redshift cross-Region snapshots in the source Region
C) Create a snapshot copy grant in the source Region for a KMS key in the source Region. Configure Redshift cross-Region snapshots in the destination Region
D) Create a snapshot copy grant in the destination Region for a KMS key in the destination Region. Configure Redshift cross-Region replication in the source Region
A
A health care application processes the real-time health data of the patients into an analytics workflow. With a sharp increase in the number of users, the system has become slow and sometimes even unresponsive as it does not have a retry mechanism. The startup is looking at a scalable solution that has minimal implementation overhead.
Which of the following would you recommend as a scalable alternative to the current solution?
A) Use Amazon Simple Notification Service (Amazon SNS) for data ingestion and configure AWS Lambda to trigger logic for downstream processing
B) Use Amazon API Gateway with the existing REST-based interface to create a high-performing architecture
C) Use Amazon Kinesis Data Streams to ingest the data, process it using AWS Lambda, or run analytics using Amazon Kinesis Data Analytics
D) Use Amazon Simple Queue Service (Amazon SQS) for data ingestion and configure AWS Lambda to trigger logic for downstream processing
C
A Silicon Valley based startup helps its users legally sign highly confidential contracts. To meet the compliance guidelines, the startup must ensure that the signed contracts are encrypted using the AES-256 algorithm via an encryption key that is generated as well as managed internally. The startup is now migrating to AWS Cloud and would like the data to be encrypted on AWS. The startup wants to continue using its existing encryption key generation as well as key management mechanism.
What do you recommend?
A) SSE-S3
B) SSE-C
C) SSE-KMS
D) Client-Side Encryption
B
A data engineering team wants to orchestrate multiple Amazon ECS task types running on Amazon EC2 instances that are part of the Amazon ECS cluster. The output and state data for all tasks need to be stored. The amount of data output by each task is approximately 20 megabytes and there could be hundreds of tasks running at a time. As old outputs are archived, the storage size is not expected to exceed 1 terabyte.
Which of the following would you recommend as an optimized solution for high-frequency reading and writing?
A) Use Amazon DynamoDB table that is accessible by all ECS cluster instances
B) Use Amazon EFS with Bursting Throughput mode
C) Use an Amazon EBS volume mounted to the Amazon ECS cluster instances
D) Use Amazon EFS with Provisioned Throughput mode
D
A company has created a data warehouse using Redshift that is used to analyze data from Amazon S3. From the usage patterns, the data engineering team has noticed that after 30 days, the data is rarely queried in Redshift and it’s not “hot data” anymore. The team would like to preserve the SQL querying capability on the data and have the query execution start immediately. Also, the team wants to adopt a pricing model that allows the company to save the maximum amount of cost on Redshift.
As an AWS Certified Data Engineer Associate, which of the following options would you recommend? (Select two)
A) Migrate the Redshift cluster’s underlying storage class to Standard-IA
B) Move the data to S3 Standard IA after 30 days
C) Move the data to S3 Glacier Deep Archive after 30 days
D) Create a smaller Redshift Cluster with the cold data
E) Analyze the cold data with Athena
BE