Review Mode 2.2 Flashcards
A start-up company that offers an intuitive financial data analytics service has consulted you about their AWS architecture. They have a fleet of Amazon EC2 worker instances that process financial data and then outputs reports which are used by their clients. You must store the generated report files in a durable storage. The number of files to be stored can grow over time as the start-up company is expanding rapidly overseas and hence, they also need a way to distribute the reports faster to clients located across the globe.
Which of the following is a cost-efficient and scalable storage option that you should use for this scenario?
- Use Amazon Redshift as the data storage and CloudFront as the CDN.
- Use Amazon S3 Glacier as the data storage and ElastiCache as the CDN.
- Use Amazon S3 as the data storage and CloudFront as the CDN.
- Use multiple EC2 instance stores for data storage and ElastiCache as the CDN.
- Use Amazon S3 as the data storage and CloudFront as the CDN.
A Content Delivery Network (CDN) is a critical component of nearly any modern web application. It used to be that CDN merely improved the delivery of content by replicating commonly requested files (static content) across a globally distributed set of caching servers. However, CDNs have become much more useful over time.
For caching, a CDN will reduce the load on an application origin and improve the experience of the requestor by delivering a local copy of the content from a nearby cache edge, or Point of Presence (PoP). The application origin is off the hook for opening the connection and delivering the content directly as the CDN takes care of the heavy lifting. The end result is that the application origins don’t need to scale to meet demands for static content.
Amazon CloudFront is a fast content delivery network (CDN) service that securely delivers data, videos, applications, and APIs to customers globally with low latency, high transfer speeds, all within a developer-friendly environment. CloudFront is integrated with AWS – both physical locations that are directly connected to the AWS global infrastructure, as well as other AWS services.
Amazon S3 offers a highly durable, scalable, and secure destination for backing up and archiving your critical data. This is the correct option as the start-up company is looking for a durable storage to store the audio and text files. In addition, ElastiCache is only used for caching and not specifically as a Global Content Delivery Network (CDN).
A company launched a website that accepts high-quality photos and turns them into a downloadable video montage. The website offers a free and a premium account that guarantees faster processing. All requests by both free and premium members go through a single SQS queue and then processed by a group of EC2 instances that generate the videos. The company needs to ensure that the premium users who paid for the service have higher priority than the free members.
How should the company re-design its architecture to address this requirement?
- Use Amazon S3 to store and process the photos and then generate the video montage afterward.
- Use Amazon Kinesis to process the photos and generate the video montage in real-time.
- Create an SQS queue for free members and another one for premium members. Configure your EC2 instances to consume messages from the premium queue first and if it is empty, poll from the free members’ SQS queue.
- For the requests made by premium members, set a higher priority in the SQS queue so it will be processed first compared to the requests made by free members.
- Create an SQS queue for free members and another one for premium members. Configure your EC2 instances to consume messages from the premium queue first and if it is empty, poll from the free members’ SQS queue.
Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications. SQS eliminates the complexity and overhead associated with managing and operating message-oriented middleware and empowers developers to focus on differentiating work. Using SQS, you can send, store, and receive messages between software components at any volume without losing messages or requiring other services to be available.
In this scenario, it is best to create 2 separate SQS queues for each type of member. The SQS queues for the premium members can be polled first by the EC2 Instances and once completed, the messages from the free members can be processed next.
Hence, the correct answer is: Create an SQS queue for free members and another one for premium members. Configure your EC2 instances to consume messages from the premium queue first and if it is empty, poll from the free members’ SQS queue.
A company receives semi-structured and structured data from different sources, which are eventually stored in their Amazon S3 data lake. The Solutions Architect plans to use big data processing frameworks to analyze these data and access it using various business intelligence tools and standard SQL queries.
Which of the following provides the MOST high-performing solution that fulfills this requirement?
- Use Amazon Managed Service for Apache Flink Studio and store the processed data in Amazon DynamoDB.
- Create an Amazon EC2 instance and store the processed data in Amazon EBS.
- Create an Amazon EMR cluster and store the processed data in Amazon Redshift.
- Use AWS Glue and store the processed data in Amazon S3.
- Create an Amazon EMR cluster and store the processed data in Amazon Redshift.
Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. By using these frameworks and related open-source projects, such as Apache Hive and Apache Pig, you can process data for analytics purposes and business intelligence workloads. Additionally, you can use Amazon EMR to transform and move large amounts of data into and out of other AWS data stores and databases.
Amazon Redshift is the most widely used cloud data warehouse. It makes it fast, simple, and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. It allows you to run complex analytic queries against terabytes to petabytes of structured and semi-structured data, using sophisticated query optimization, columnar storage on high-performance storage, and massively parallel query execution.
The key phrases in the scenario are “big data processing frameworks” and “various business intelligence tools and standard SQL queries” to analyze the data. To leverage big data processing frameworks, you need to use Amazon EMR. The cluster will perform data transformations (ETL) and load the processed data into Amazon Redshift for analytic and business intelligence applications.
Hence, the correct answer is: Create an Amazon EMR cluster and store the processed data in Amazon Redshift.
A media company has two VPCs: VPC-1 and VPC-2 with peering connection between each other. VPC-1 only contains private subnets while VPC-2 only contains public subnets. The company uses a single AWS Direct Connect connection and a virtual interface to connect their on-premises network with VPC-1.
Which of the following options increase the fault tolerance of the connection to VPC-1? (Select TWO.)
- Use the AWS VPN CloudHub to create a new AWS Direct Connect connection and private virtual interface in the same region as VPC-2.
- Establish a hardware VPN over the Internet between VPC-1 and the on-premises network.
- Establish a hardware VPN over the Internet between VPC-2 and the on-premises network.
- Establish a new AWS Direct Connect connection and private virtual interface in the same region as VPC-2.
- Establish another AWS Direct Connect connection and private virtual interface in the same AWS region as VPC-1.
In this scenario, you have two VPCs that have peering connections with each other. Note that a VPC peering connection does not support edge-to-edge routing. This means that if either VPC in a peering relationship has one of the following connections, you cannot extend the peering relationship to that connection:
– A VPN connection or an AWS Direct Connect connection to a corporate network
– An Internet connection through an Internet gateway
– An Internet connection in a private subnet through a NAT device
– A gateway VPC endpoint to an AWS service; for example, an endpoint to Amazon S3.
– (IPv6) A ClassicLink connection. You can enable IPv4 communication between a linked EC2-Classic instance and instances in a VPC on the other side of a VPC peering connection. However, IPv6 is not supported in EC2-Classic, so you cannot extend this connection for IPv6 communication.
For example, if VPC A and VPC B are peered, and VPC A has any of these connections, then instances in VPC B cannot use the connection to access resources on the other side of the connection. Similarly, resources on the other side of a connection cannot use the connection to access VPC B.
Hence, this implies that VPC-2 cannot extend the peering relationship between VPC-1 and the on-premises network. In other words, traffic originating from the corporate network cannot establish a direct connection to VPC-1 by routing it through VPC-2, whether using a VPN connection or an AWS Direct Connect connection. The VPC peering connection is a one-to-one relationship with directly peered entities (VPC-1 and the on-premises network), and it does not support transitive communication through intermediate VPCs like VPC-2, which is why the following options are incorrect:
– Use the AWS VPN CloudHub to create a new AWS Direct Connect connection and private virtual interface in the same region as VPC-2.
– Establish a hardware VPN over the Internet between VPC-2 and the on-premises network.
– Establish a new AWS Direct Connect connection and private virtual interface in the same region as VPC-2.
A company has a static corporate website hosted in a standard S3 bucket and a new web domain name that was registered using Route 53. You are instructed by your manager to integrate these two services in order to successfully launch their corporate website.
What are the prerequisites when routing traffic using Amazon Route 53 to a website that is hosted in an Amazon S3 Bucket? (Select TWO.)
- The S3 bucket name must be the same as the domain name
- The S3 bucket must be in the same region as the hosted zone
- The Cross-Origin Resource Sharing (CORS) option should be enabled in the S3 bucket
- The record set must be of type “MX”
- A registered domain name
- The S3 bucket name must be the same as the domain name
- A registered domain name
Here are the prerequisites for routing traffic to a website that is hosted in an Amazon S3 Bucket:
– An S3 bucket that is configured to host a static website. The bucket must have the same name as your domain or subdomain. For example, if you want to use the subdomain portal.tutorialsdojo.com, the name of the bucket must be portal.tutorialsdojo.com.
– A registered domain name. You can use Route 53 as your domain registrar, or you can use a different registrar.
– Route 53 as the DNS service for the domain. If you register your domain name by using Route 53, we automatically configure Route 53 as the DNS service for the domain.
A Solutions Architect is building a cloud infrastructure where EC2 instances require access to various AWS services such as S3 and Redshift. The Architect will also need to provide access to system administrators so they can deploy and test their changes.
Which configuration should be used to ensure that the access to the resources is secured and not compromised? (Select TWO.)
- Enable Multi-Factor Authentication.
- Assign an IAM role to the Amazon EC2 instance.
- Store the AWS Access Keys in ACM.
- Store the AWS Access Keys in the EC2 instance.
- Assign an IAM user for each Amazon EC2 Instance.
In this scenario, the correct answers are:
– Enable Multi-Factor Authentication
– Assign an IAM role to the Amazon EC2 instance
Always remember that you should associate IAM roles to EC2 instances and not an IAM user, for the purpose of accessing other AWS services. IAM roles are designed so that your applications can securely make API requests from your instances, without requiring you to manage the security credentials that the applications use. Instead of creating and distributing your AWS credentials, you can delegate permission to make API requests using IAM roles.
AWS Multi-Factor Authentication (MFA) is a simple best practice that adds an extra layer of protection on top of your user name and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their user name and password (the first factor—what they know), as well as for an authentication code from their AWS MFA device (the second factor—what they have). Taken together, these multiple factors provide increased security for your AWS account settings and resources. You can enable MFA for your AWS account and for individual IAM users you have created under your account. MFA can also be used to control access to AWS service APIs.
A company has a serverless application made up of AWS Amplify, Amazon API Gateway and a Lambda function. The application is connected to an Amazon RDS MySQL database instance inside a private subnet. A Lambda Function URL is also implemented as the dedicated HTTPS endpoint for the function, which has the following value:
https://12june1898pil1pinas.lambda-url.us-west-2.on.aws/
There are times during peak loads when the database throws a “too many connections” error preventing the users from accessing the application.
Which solution could the company take to resolve the issue?
- Increase the concurrency limit of the Lambda function
- Increase the memory allocation of the Lambda function
- Provision an RDS Proxy between the Lambda function and RDS database instance
- Increase the rate limit of API Gateway
- Provision an RDS Proxy between the Lambda function and RDS database instance
If a “Too Many Connections” error happens to a client connecting to a MySQL database, it means all available connections are in use by other clients. Opening a connection consumes resources on the database server. Since Lambda functions can scale to tens of thousands of concurrent connections, your database needs more resources to open and maintain connections instead of executing queries. The maximum number of connections a database can support is largely determined by the amount of memory allocated to it. Upgrading to a database instance with higher memory is a straightforward way of solving the problem. Another approach would be to maintain a connection pool that clients can reuse. This is where RDS Proxy comes in.
RDS Proxy helps you manage a large number of connections from Lambda to an RDS database by establishing a warm connection pool to the database. Your Lambda functions interact with RDS Proxy instead of your database instance. It handles the connection pooling necessary for scaling many simultaneous connections created by concurrent Lambda functions. This allows your Lambda applications to reuse existing connections, rather than creating new connections for every function invocation.
Thus, the correct answer is: Provision an RDS Proxy between the Lambda function and RDS database instance format
A commercial bank has a forex trading application. They created an Auto Scaling group of EC2 instances that allow the bank to cope with the current traffic and achieve cost-efficiency. They want the Auto Scaling group to behave in such a way that it will follow a predefined set of parameters before it scales down the number of EC2 instances, which protects the system from unintended slowdown or unavailability.
Which of the following statements are true regarding the cooldown period? (Select TWO.)
- Its default value is 300 seconds.
- It ensures that the Auto Scaling group launches or terminates additional EC2 instances without any downtime.
- It ensures that before the Auto Scaling group scales out, the EC2 instances have an ample time to cooldown.
- Its default value is 600 seconds.
- It ensures that the Auto Scaling group does not launch or terminate additional EC2 instances before the previous scaling activity takes effect.
- Its default value is 300 seconds.
- It ensures that the Auto Scaling group does not launch or terminate additional EC2 instances before the previous scaling activity takes effect.
In Auto Scaling, the following statements are correct regarding the cooldown period:
It ensures that the Auto Scaling group does not launch or terminate additional EC2 instances before the previous scaling activity takes effect.
Its default value is 300 seconds.
It is a configurable setting for your Auto Scaling group.
The following options are incorrect:
– It ensures that before the Auto Scaling group scales out, the EC2 instances have ample time to cooldown.
– It ensures that the Auto Scaling group launches or terminates additional EC2 instances without any downtime.
– Its default value is 600 seconds.
These statements are inaccurate and don’t depict what the word “cooldown” actually means for Auto Scaling. The cooldown period is a configurable setting for your Auto Scaling group that helps to ensure that it doesn’t launch or terminate additional instances before the previous scaling activity takes effect. After the Auto Scaling group dynamically scales using a simple scaling policy, it waits for the cooldown period to complete before resuming scaling activities.
A DevOps Engineer is required to design a cloud architecture in AWS. The Engineer is planning to develop a highly available and fault-tolerant architecture consisting of an Elastic Load Balancer and an Auto Scaling group of EC2 instances deployed across multiple Availability Zones. This will be used by an online accounting application that requires path-based routing, host-based routing, and bi-directional streaming using Remote Procedure Call (gRPC).
Which configuration will satisfy the given requirement?
- Configure an Application Load Balancer in front of the auto-scaling group. Select gRPC as the protocol version.
- Configure a Network Load Balancer in front of the auto-scaling group. Use a UDP listener for routing.
- Configure a Gateway Load Balancer in front of the auto-scaling group. Ensure that the IP Listener Routing uses the GENEVE protocol on port 6081 to allow gRPC response traffic.
- Configure a Network Load Balancer in front of the auto-scaling group. Create an AWS Global Accelerator accelerator and set the load balancer as an endpoint.
- Configure an Application Load Balancer in front of the auto-scaling group. Select gRPC as the protocol version.
Application Load Balancer operates at the request level (layer 7), routing traffic to targets (EC2 instances, containers, IP addresses, and Lambda functions) based on the content of the request. Ideal for advanced load balancing of HTTP and HTTPS traffic, Application Load Balancer provides advanced request routing targeted at delivery of modern application architectures, including microservices and container-based applications. Application Load Balancer simplifies and improves the security of your application, by ensuring that the latest SSL/TLS ciphers and protocols are used at all times.
If your application is composed of several individual services, an Application Load Balancer can route a request to a service based on the content of the request such as Host field, Path URL, HTTP header, HTTP method, Query string, or Source IP address.
ALBs can also route and load balance gRPC traffic between microservices or between gRPC-enabled clients and services. This will allow customers to seamlessly introduce gRPC traffic management in their architectures without changing any of the underlying infrastructure on their clients or services.
Therefore, the correct answer is: Configure an Application Load Balancer in front of the auto-scaling group. Select gRPC as the protocol version.
A GraphQL API hosted is hosted in an Amazon EKS cluster with Fargate launch type and deployed using AWS SAM. The API is connected to an Amazon DynamoDB table with an Amazon DynamoDB Accelerator (DAX) as its data store. Both resources are hosted in the us-east-1 region.
The AWS IAM authenticator for Kubernetes is integrated into the EKS cluster for role-based access control (RBAC) and cluster authentication. A solutions architect must improve network security by preventing database calls from traversing the public internet. An automated cross-account backup for the DynamoDB table is also required for long-term retention.
Which of the following should the solutions architect implement to meet the requirement?
- Create a DynamoDB interface endpoint. Associate the endpoint to the appropriate route table. Enable Point-in-Time Recovery (PITR) to restore the DynamoDB table to a particular point in time on the same or a different AWS account.
- Create a DynamoDB interface endpoint. Set up a stateless rule using AWS Network Firewall to control all outbound traffic to only use the dynamodb.us-east-1.amazonaws.com endpoint. Integrate the DynamoDB table with Amazon Timestream to allow point-in-time recovery from a different AWS account.
- Create a DynamoDB gateway endpoint. Associate the endpoint to the appropriate route table. Use AWS Backup to automatically copy the on-demand DynamoDB backups to another AWS account for disaster recovery.
- Create a DynamoDB gateway endpoint. Set up a Network Access Control List (NACL) rule that allows outbound traffic to the dynamodb.us-east-1.amazonaws.com gateway endpoint. Use the built-in on-demand DynamoDB backups for cross-account backup and recovery.
- Create a DynamoDB gateway endpoint. Associate the endpoint to the appropriate route table. Use AWS Backup to automatically copy the on-demand DynamoDB backups to another AWS account for disaster recovery.
Since DynamoDB tables are public resources, applications within a VPC rely on an Internet Gateway to route traffic to/from Amazon DynamoDB. You can use a Gateway endpoint if you want to keep the traffic between your VPC and Amazon DynamoDB within the Amazon network. This way, resources residing in your VPC can use their private IP addresses to access DynamoDB with no exposure to the public internet.
When you create a DynamoDB Gateway endpoint, you specify the VPC where it will be deployed as well as the route table that will be associated with the endpoint. The route table will be updated with an Amazon DynamoDB prefix list (list of CIDR blocks) as the destination and the endpoint’s ID as the target.
DynamoDB on-demand backups are available at no additional cost beyond the normal pricing that’s associated with backup storage size. DynamoDB on-demand backups cannot be copied to a different account or Region. To create backup copies across AWS accounts and Regions and for other advanced features, you should use AWS Backup.
With AWS Backup, you can configure backup policies and monitor activity for your AWS resources and on-premises workloads in one place. Using DynamoDB with AWS Backup, you can copy your on-demand backups across AWS accounts and Regions, add cost allocation tags to on-demand backups, and transition on-demand backups to cold storage for lower costs. To use these advanced features, you must opt into AWS Backup. Opt-in choices apply to the specific account and AWS Region, so you might have to opt into multiple Regions using the same account.
Hence, the correct answer is: Create a DynamoDB gateway endpoint. Associate the endpoint to the appropriate route table. Use AWS Backup to automatically copy the on-demand DynamoDB backups to another AWS account for disaster recovery.
An online events registration system is hosted in AWS and uses ECS to host its front-end tier and an RDS configured with Multi-AZ for its database tier. What are the events that will make Amazon RDS automatically perform a failover to the standby replica? (Select TWO.)
- Storage failure on primary
- Loss of availability in primary Availability Zone
- Compute unit failure on secondary DB instance
- In the event of Read Replica failure
- Storage failure on secondary DB instance
- Storage failure on primary
- Loss of availability in primary Availability Zone
Amazon RDS provides high availability and failover support for DB instances using Multi-AZ deployments. Amazon RDS uses several different technologies to provide failover support. Multi-AZ deployments for Oracle, PostgreSQL, MySQL, and MariaDB DB instances use Amazon’s failover technology. SQL Server DB instances use SQL Server Database Mirroring (DBM).
In a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to a standby replica to provide data redundancy, eliminate I/O freezes, and minimize latency spikes during system backups. Running a DB instance with high availability can enhance availability during planned system maintenance and help protect your databases against DB instance failure and Availability Zone disruption.
Amazon RDS detects and automatically recovers from the most common failure scenarios for Multi-AZ deployments so that you can resume database operations as quickly as possible without administrative intervention.
The high-availability feature is not a scaling solution for read-only scenarios; you cannot use a standby replica to serve read traffic. To service read-only traffic, you should use a Read Replica.
Amazon RDS automatically performs a failover in the event of any of the following:
Loss of availability in primary Availability Zone.
Loss of network connectivity to primary.
Compute unit failure on primary.
Storage failure on primary.
Hence, the correct answers are:
– Loss of availability in primary Availability Zone
– Storage failure on primary
An accounting application uses an RDS database configured with Multi-AZ deployments to improve availability. What would happen to RDS if the primary database instance fails?
- The canonical name record (CNAME) is switched from the primary to standby instance.
- The primary database instance will reboot.
- A new database instance is created in the standby Availability Zone.
- The IP address of the primary DB instance is switched to the standby DB instance.
- The canonical name record (CNAME) is switched from the primary to standby instance.
In Amazon RDS, failover is automatically handled so that you can resume database operations as quickly as possible without administrative intervention in the event that your primary database instance goes down. When failing over, Amazon RDS simply flips the canonical name record (CNAME) for your DB instance to point at the standby, which is in turn promoted to become the new primary.
An application is hosted in AWS Fargate and uses RDS database in Multi-AZ Deployments configuration with several Read Replicas. A Solutions Architect was instructed to ensure that all of their database credentials, API keys, and other secrets are encrypted and rotated on a regular basis to improve data security. The application should also use the latest version of the encrypted credentials when connecting to the RDS database.
Which of the following is the MOST appropriate solution to secure the credentials?
- Store the database credentials, API keys, and other secrets in AWS KMS.
- Store the database credentials, API keys, and other secrets to AWS ACM.
- Store the database credentials, API keys, and other secrets to Systems Manager Parameter Store each with a SecureString data type. The credentials are automatically rotated by default.
- Use AWS Secrets Manager to store and encrypt the database credentials, API keys, and other secrets. Enable automatic rotation for all of the credentials.
- Use AWS Secrets Manager to store and encrypt the database credentials, API keys, and other secrets. Enable automatic rotation for all of the credentials.
AWS Secrets Manager is an AWS service that makes it easier for you to manage secrets. Secrets can be database credentials, passwords, third-party API keys, and even arbitrary text. You can store and control access to these secrets centrally by using the Secrets Manager console, the Secrets Manager command line interface (CLI), or the Secrets Manager API and SDKs.
In the past, when you created a custom application that retrieves information from a database, you typically had to embed the credentials (the secret) for accessing the database directly in the application. When it came time to rotate the credentials, you had to do much more than just create new credentials. You had to invest time in updating the application to use the new credentials. Then you had to distribute the updated application. If you had multiple applications that shared credentials and you missed updating one of them, the application would break. Because of this risk, many customers have chosen not to regularly rotate their credentials, which effectively substitutes one risk for another.
Secrets Manager enables you to replace hardcoded credentials in your code (including passwords), with an API call to Secrets Manager to retrieve the secret programmatically. This helps ensure that the secret can’t be compromised by someone examining your code because the secret simply isn’t there. Also, you can configure Secrets Manager to automatically rotate the secret for you according to the schedule that you specify. This enables you to replace long-term secrets with short-term ones, which helps to significantly reduce the risk of compromise.
Hence, the most appropriate solution for this scenario is: Use AWS Secrets Manager to store and encrypt the database credentials, API keys, and other secrets. Enable automatic rotation for all of the credentials.
An e-commerce company is receiving a large volume of sales data files in .csv format from its external partners on a daily basis. These data files are then stored in an Amazon S3 Bucket for processing and reporting purposes.
The company wants to create an automated solution to convert these .csv files into Apache Parquet format and store the output of the processed files in a new S3 bucket called “tutorialsdojo-data-transformed”. This new solution is meant to enhance the company’s data processing and analytics workloads while keeping its operating costs low.
Which of the following options must be implemented to meet these requirements with the LEAST operational overhead?
- Use Amazon S3 event notifications to trigger an AWS Lambda function that converts .csv files to Apache Parquet format using Apache Spark on an Amazon EMR cluster. Save the processed files to the “tutorialsdojo-data-transformed” bucket.
- Utilize an AWS Batch job definition with Bash syntax to convert the .csv files to the Apache Parquet format. Configure the job definition to run automatically whenever a new .csv file is uploaded to the source bucket.
- Integrate Amazon EMR File System (EMRFS) with the source S3 bucket to automatically discover the new data files. Use an Amazon EMR Serverless with Apache Spark to convert the .csv files to the Apache Parquet format and then store the output in the “tutorialsdojo-data-transformed” bucket.
- Use AWS Glue crawler to automatically discover the raw data file in S3 as well as check its corresponding schema. Create a scheduled ETL job in AWS Glue that will convert .csv files to Apache Parquet format and store the output of the processed files in the “tutorialsdojo-data-transformed” bucket.
- Use AWS Glue crawler to automatically discover the raw data file in S3 as well as check its corresponding schema. Create a scheduled ETL job in AWS Glue that will convert .csv files to Apache Parquet format and store the output of the processed files in the “tutorialsdojo-data-transformed” bucket.
AWS Glue is a fully managed extract, transform, and load (ETL) service. AWS Glue makes it cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores and data streams. This pattern provides different job types in AWS Glue and uses three different scripts to demonstrate authoring ETL jobs.
Apache Parquet is built to support efficient compression and encoding schemes. It can speed up your analytics workloads because it stores data in a columnar fashion. Converting data to Parquet can save you storage space, cost, and time in the longer run.
AWS Glue retrieves data from sources and writes data to targets stored and transported in various data formats. AWS Glue supports using the Parquet format. This format is a performance-oriented, column-based data format You can use AWS Glue to read Parquet files from Amazon S3 and from streaming sources as well as write Parquet files to Amazon S3. You can read and write bzip and gzip archives containing Parquet files from S3.
When a crawler runs, it takes the following actions to interrogate a data store:
Classifies data to determine the format, schema, and associated properties of the raw data – You can configure the results of classification by creating a custom classifier.
Groups data into tables or partitions – Data is grouped based on crawler heuristics.
Writes metadata to the Data Catalog – You can configure how the crawler adds, updates, and deletes tables and partitions.
A multinational company currently operates multiple AWS accounts to support its operations across various branches and business units. The company needs a more efficient and secure approach in managing its vast AWS infrastructure to avoid costly operational overhead.
To address this, they plan to transition to a consolidated, multi-account architecture while integrating a centralized corporate directory service for authentication purposes.
Which combination of options can be used to meet the above requirements? (Select TWO.)
- Set up a new entity in AWS Organizations and configure its authentication system to utilize AWS Directory Service directly.
- Utilize AWS CloudTrail to enable centralized logging and monitoring across all AWS accounts.
- Implement AWS Organizations to create a multi-account architecture that provides a consolidated view and centralized management of AWS accounts.
- Integrate AWS IAM Identity Center with the corporate directory service for centralized authentication. Configure a service control policy (SCP) to manage the AWS accounts.
- Establish an identity pool through Amazon Cognito and adjust the AWS IAM Identity Center settings to allow Amazon Cognito authentication.
- Implement AWS Organizations to create a multi-account architecture that provides a consolidated view and centralized management of AWS accounts.
- Integrate AWS IAM Identity Center with the corporate directory service for centralized authentication. Configure a service control policy (SCP) to manage the AWS accounts.
AWS Organization is a service that allows you to manage multiple AWS accounts easily. With this service, you can effectively consolidate billing and manage your resources across multiple accounts. AWS IAM Identity Center can be integrated with your corporate directory service for centralized authentication. This means you can sign in to multiple AWS accounts with just one set of credentials. This integration helps to streamline the authentication process and makes it easier for companies to switch between accounts.
In addition to this, you can also configure a service control policy (SCP) to manage your AWS accounts. SCPs help you enforce policies across your organization and control the services and features accessible to your other account. This way, you can ensure that your organization’s resources are used only as intended and prevent unauthorized access. You can provide secure and centralized management of your AWS accounts by setting up AWS Organization, integrating AWS IAM Identity Center with your corporate directory service, and configuring SCPs. This simplifies your management process and helps you maintain better control over your resources.
Hence the correct answers are:
– Integrate AWS IAM Identity Center with the corporate directory service for centralized authentication. Configure a service control policy (SCP) to manage the AWS accounts.
– Implement AWS Organizations to create a multi-account architecture that provides a consolidated view and centralized management of AWS accounts.