Rapid Fire Exam Questions Flashcards
What is the ReplaceUnhealthy process used for in auto-scaling groups?
The ReplaceUnhealthy process is used to terminate/replace EC2 instances which have been marked as unhealthy during a health check performed by a load balancer or the EC2 service.
Processes in auto-scaling groups can be suspended/resumed at any time. This can be useful when performing maintenance on EC2 instances which are part of the ASG without triggering undesired actions.
What happens when an EC2 instance’s status is modified from InService to Standby?
The Standby status is mainly used for updating + troubleshooting EC2 instances which are part of an auto-scaling group. Instances which are on Standby are still part of the Auto Scaling group, but they do not actively handle load balancer traffic.
When you put an instance on Standby, you can either decrement the desired capacity through this operation, or keep it at the same value. If you choose to decrement the desired capacity of the Auto Scaling group, this prevents the launch of an instance to replace the one on Standby. If you choose not to decrement the desired capacity of the Auto Scaling group, Amazon EC2 Auto Scaling launches an instance to replace the one on Standby.
What is Kinesis Data Streams?
Amazon Kinesis Data Streams is a service which enables real-time processing of streaming big data. It provides ordering of records, as well as the ability to read and/or replay records in the same order to multiple Amazon Kinesis Applications.
List 4 advantages/applications of Kinesis Data Streams.
1) Routing related records to the same record consumer (as in streaming MapReduce). For example, counting and aggregation are simpler when all records for a given key are routed to the same record processor.
2) Ordering of records. For example, you want to transfer log data from the application host to the processing/archival host while maintaining the order of log statements.
3) Ability for multiple applications to consume the same stream concurrently. For example, you have one application that updates a real-time dashboard and another that archives data to Amazon Redshift. You want both applications to consume data from the same stream concurrently and independently.
4) Ability to consume records in the same order a few hours later. For example, you have a billing application and an audit application that runs a few hours behind the billing application. Because Amazon Kinesis Data Streams stores data for up to 365 days, you can run the audit application up to 365 days behind the billing application.
What software tools can be used to create or retrieve records from a shard in a Kinesis Data Stream?
The Amazon Kinesis Producer Library (KPL) can be used for creating/delivering records to a particular shard in a data stream.
The Amazon Kinesis Client Library (KCL) can be used for retrieving records stored in a particular shard.
Both the KPL/KCL are high-level libraries built on top of the AWS SDK.
What are the min/max retention periods for records stored in a Kinesis Data Stream?
Between 1-365 days.
List 3 different AWS services which can be set as a shard consumer in a Kinesis Data Stream.
1) AWS Lambda
2) Kinesis Data Firehose
3) Kinesis Data Analytics
What is an Amazon S3 event notification? List 4 different AWS services which can be used as target destinations.
The Amazon S3 event notification feature enables AWS services to receive notifications when certain API calls are made and events are triggered in an S3 bucket (Ex: object creation). To enable notifications, you must first add a notification configuration which identifies the events you want Amazon S3 to publish and the destination where you want Amazon S3 to send the notifications. To send S3 event notifications from a single bucket to multiple destinations, a separate event notification must be configured for each destination.
Amazon S3 supports the following event destinations:
SNS Topics (not FIFO)
SQS Queues (not FIFO)
AWS Lambda Functions
Amazon EventBridge
-Note that each AWS service which receives S3 event notifications must have a resource policy attached allowing access from the S3 bucket.
What is object key name filtering and how is it used when configuring S3 event notifications?
Object key name filtering allows S3 event notifications to be configured which only send event notifications related to objects whose key names (prefix or suffix) match a particular filtering condition. Ex: only sending notifications originating from objects with a particular file extension (*.jpg).
Note that when configuring an S3 bucket to send event notifications to Amazon EventBridge, any/all events generated will be delivered to EventBridge. It is not possible to limit or filter which events are sent by either event type (Ex: S3:ObjectCreated) or using object key name filtering.
Describe the AWS Glue service.
AWS Glue is a managed service for performing extract, transform, and load (ETL) operations using a serverless architecture and is commonly used to transform data in preparation for data analytics.
Ex: a Glue job could involve loading data from an S3 bucket or RDS DB, transforming it using a Lambda function, then loading into a RedShift Data Warehouse.
Which AWS service can be used to convert data into the Apach Parquet or ORC file formats and why is this beneficial?
The AWS Glue service can be used to convert file formats (Ex: csv) into the ORC/Parquet formats. These are both columnar file formats for efficient data storage and retrieval. This is useful when employing AWS services such as Amazon Athena, which improves performance and saves costs by reducing the amount of data scanned during an SQL query.
Describe an AWS architecture which can be used to automatically trigger a Glue job after uploading a file to an S3 bucket.
One architecture could involve using S3 event notifications triggered on object creation events and attached to either a Lambda function or Amazon EventBridge. This in turn could be used to trigger a Glue job on the S3 object which might transform the file and push it to another destination.
What are Glue job bookmarks?
AWS Glue tracks data which has already been processed during a previous run of an ETL job by persisting information from the job run, known as a job bookmark. This helps AWS Glue maintain state information and prevent the reprocessing of old data.
With job bookmarks, you can process new data when rerunning on a scheduled interval. Ex: an ETL job might read only new partitions in an Amazon S3 file. AWS Glue tracks which partitions the job has processed successfully to prevent duplicate processing and duplicate data in the job’s target data store.
List 3 types of data sources which can be tracked using a Glue job bookmark.
Glue job bookmarks are implemented for: JDBC data sources, the Relationalize transform, and S3 buckets.
Describe the Amazon SageMaker service.
Amazon SageMaker is a managed service used to simplify the process of building and training machine-learning models for data scientists in a serverless fashion.
SageMaker can automate many common ML tasks, including: data labeling, ML model building, training, and deployment. This is all done using training data provided by the data scientist.
What AWS service should be used when you’d like to analyze data stored in an S3 bucket using serverless SQL?
Amazon Athena.
What are the advantages of launching EC2 instances using dedicated hardware?
Dedicated hosts and dedicated instances are EC2 purchasing options which are useful for companies which have strict regulatory/compliance requirements or software licenses which demand dedicated hardware. This can include legal requirements such as HIPPA which require dedicated infrastructure for storing patient information. EC2 instances launched using dedicated hardware do not share their physical resources with any other AWS accounts.
Dedicated purchasing options are also useful for software with complicated licensing models (BYOL - Bring Your Own License).
What are the differences between the dedicated host and dedicated instance options when launching an EC2 instance?
Dedicated Instances are Amazon EC2 instances which run on hardware dedicated to a single customer. Dedicated Instances may share hardware with other instances from the same AWS account that are not Dedicated Instances.
With Dedicated Hosts, the entire physical server is reserved for a single AWS account. It does not change, it’s always the same physical machine for as long as you are paying. As soon as you ‘allocate’ a Dedicated Host, you start paying for the entire host.
A host computer is very large. In fact, it is the size of the largest instance of the selected family, but can be divided-up into smaller instances of the same family. (“You can run any number of instances up to the core capacity associated with the host.”)
Any instances that run on that Host are not charged, since you are already being billed for the Host. That is why a Dedicated Host is more expensive than a Dedicated Instance – the charge is for the whole host.
What are the minimum and maximum retention periods for messages stored in an SQS queue? What is the default retention period?
The default retention period for messages stored in an SQS queue is 4 days. The min/max ranges are between 1 min. and 14 days.
What are the minimum and maximum sizes (in KB) allowed when submitting messages to an SQS queue?
1-256 KB.
How many messages can be stored simultaneously in an SQS message queue?
A single SQS message queue can contain an unlimited number of messages. However, there is a limit on the # of in-flight messages allowed for both standard and FIFO queues.
Messages are in-flight after they have been received from the queue by a consuming component, but have not yet been deleted from the queue. However, there is a limit of 120,000 messages for the number of in-flight messages for a standard queue and 20,000 messages for a FIFO queue.
An IAM user successfully creates a Route 53 CNAME record for a domain called ‘www.example.com’ but when trying to create a similar record for ‘example.com’, the request failed. Why is this?
‘example.com’ is an example of a second-level domain (SLD) also known as a Zone Apex. It is not possible to create CNAME records for either TLD’s or SLD’s.
What is a Route 53 hosted zone? What are the two types of hosted zones available in AWS?
A Route 53 hosted zone is a container for records which define how to route traffic for a particular domain and any of its subdomains. Hosted zones come in two varieties: public and private.
Public hosted zones contain records specifying how to route traffic over the internet (Ex: www.google.com). Public hosted zones connect public domain names (which must be purchased) to public IP addresses.
Private hosted zones instead contain records specifying how to route traffic within one or more VPC’s. Private hosted zones connect private domain names to private IP addresses.
What is the default TTL for records returned in Route 53 DNS queries?
300 seconds.
What is an NS record?
NS stands for ‘nameserver,’ and the nameserver record indicates which DNS server is authoritative for that domain (i.e. which server contains the actual DNS records). Basically, NS records tell the Internet where to go to find out a domain’s IP address.
What are the difference between CNAME records and Alias records when resolving DNS queries in Route 53?
CNAME records are used to redirect a hostname to any other hostname (Ex: www.google.com -> google.com). The client will then perform a subsequent DNS query using the value from the CNAME record to obtain the IP address for routing traffic.
Note that a CNAME record cannot be used to route the root domain to a subdomain (Ex: google.com -> www.google.com).
Alias records are unique to the Route 53 service and are used to redirect a hostname to an AWS resource. Unlike CNAME records, alias records can be used to route a root domain to a subdomain. Aliases are also free of charge and have a built-in health check.
List 8 valid AWS resource targets which can be set as the value of an Alias record. Which AWS service/resource is a notable exception?
Valid Alias record targets include:
1) Elastic Load Balancers
2) CloudFront Distributions
3) API Gateway
4) Elastic Beanstalk Environments
5) S3 Static Websites
6) VPC Interface Endpoints
7) Global Accelerator
8) Any other Route 53 record in the same hosted zone.
Note that EC2 instances cannot be set as the value of an Alias record in Route 53.
List 8 different routing policies which can be applied to a hosted zone in Route 53.
1) Simple
2) Weighted
3) Latency
4) Failover
5) Geolocation
6) Geoproximity
7) IP-based
8) Multi-Value
Describe the simple routing policy in Route 53. What happens if there are multiple values attached to the same DNS record?
Most common routing policy employed in Route 53. The simple routing policy is used to route traffic to a single destination. Note that simple routing is not compatible with AWS Health Checks.
It is possible for a DNS record to contain multiple values (Ex: an A-record with multiple IP addresses listed). If this is the case, in simple routing the client machine will randomly select one of the values contained in the record to use as a destination for traffic.
Describe the weighted routing policy in Route 53. What are some use cases where it may be employed?
Weighted routing is used to control what % of incoming traffic is routed to a particular destination by assigning relative weights to DNS records with the same record name. The relative weights of Route 53 records with the same name will be used to determine which record is returned to clients in a DNS query. Unlike simple routing policies, weighted routing can be associated with AWS Health Checks.
Use cases for weighted routing include:
load balancing traffic between different AWS regions.
testing new application versions by sending a small % of traffic.
Describe the latency-based routing policy in Route 53.
Latency-based routing policies route traffic to destinations which have the lowest possible latency. This is useful when latency is a significant factor impacting user experience/performance. Latency is measured by the amount of traffic between users and destination AWS Regions. Ex: users in Germany might have their traffic directed to a different AWS Region than users in the U.S.
Describe the failover routing policy in Route 53.
Failover routing policies associate each destination with a health check defined in Route 53. If the primary destination passes the health check, then Route 53 will return the DNS record associated with the primary destination. If the primary destination fails the health check, then Route 53 will instead return the DNS record for a secondary destination.
Describe the geolocation routing policy in Route 53. With what geographic precision can the routing policy be defined?
Geolocation-based routing policies are used to route users to target destinations based on their geophysical location. Geolocation for routing can be specified at the continent, country, or U.S. state levels.
With geolocation-based routing, there is typically a default record set which is returned by Route 53 if the user’s geolocation does not match any of the other records.
Describe the geoproximity routing policy in Route 53.
Geoproximity routing policies are used to route users based on the geographic distance between users and destination resources specified in their DNS records. This automatically routes users to the closest geographic location by default. However, a bias value can be applied to each DNS record, either positive (1-99) or negative (-1 to -99), in order to change the weight given to a particular resource/DNS record.
Geoproximity routing can be thought of as a combination of geolocation routing and weighted routing policies. Note that the Route 53 Traffic Flow advanced feature must be enabled to use this routing policy.
Describe the IP-based routing policy in Route 53.
IP-based routing routes users to different target destinations based on the IP address of the client. Each Route 53 DNS record can be associated with a CIDR Block defining which client IP addresses should be routed to a particular endpoint/destination.
This can be used to, for example, route end users to a particular endpoint based on their Internet Service Provider (ISP).
Describe the Multi-Value routing policy in Route 53.
Multi-Value routing is used to return multiple records to a client during a DNS query. Up to 8 healthy records can be returned for each multi-value query. The client will then randomly determine which destination to send subsequent requests to.
Note that, unlike simple routing, Multi-Value routing policies can be associated with Route 53 health checks and are a better solution when multiple desinations should be returned in a DNS query.
Describe the Amazon FSx service in AWS. List 4 different types of resources which can be launched using the service.
Amazon FSx is a fully managed service for launching 3rd party high-performance files systems on AWS. Files systems which can be launched on Amazon FSx include:
1) Windows File Server
2) Lustre
3) NetApp ONTAP
4) OpenZFS
Describe the Amazon FSx for Lustre file system.
Lustre == ‘Linux Cluster’.
Lustre is a high-performance, parallel distributed file system for large-scale computing. It is used for workloads such as machine learning, high-performance computing (HPC), video processing, and financial modeling. Lustre can be accessed from on-premise servers using either VPN or Direct Connect.
The open-source Lustre file system is designed for applications which require fast storage – where you want your storage to keep up with your compute. FSx for Lustre integrates with Amazon S3, making it easy to process data sets with the Lustre file system. When linked to an S3 bucket, an FSx for Lustre file system transparently presents S3 objects as files and allows you to write changed data back to S3.
Describe 3 technical specifications of the Amazon FSx for Lustre file system.
Lustre supports:
up to 100s GB/sec.
millions of IOPS.
sub-ms latencies for read/write operations.
Describe the Amazon FSx for Windows File Server file system.
FSx for Windows File Server is a fully managed, Windows file system shared drive. It supports the SMB & Windows NTFS protocols and has Microsoft Active Directory integration. FSx for Windows File Server also supports Microsoft’s Distributed File System (DFS) Namespaces.
FSx for Windows File Server can be accessed from on-premise servers using either VPN or Direct Connect. It also can be configured for Multi-AZ and data is backed up daily to an S3 bucket.
Note that FSx for Windows File Server can be mounted on Linux EC2 instances.
Describe 3 technical specifications of the Amazon FSx for Windows File Server file system.
Windows File Server supports:
up to 10s GB/sec.
millions of IOPS.
100s PB’s of data.
Describe the AWS Transfer Family service.
AWS Transfer Family is a managed AWS service which is used to read/write files into Amazon S3 or Amazon EFS using the FTP or FTPS protocols rather than the standard AWS API’s for interacting with S3/EFS.
AWS Transfer Family supports integration with authentication systems such as Microsoft Active Directory, LDAP, and Amazon Cognito. Accounts are billed per provisioned endpoint per hour + the amount of data transferred in GB.
Describe the AWS Storage Gateway service. List the 4 different types of the Storage Gateway service.
AWS Storage Gateway is an AWS service which is used to enable hybrid cloud architectures where some infrastructure exists in the cloud and other parts of the infrastructure remain on-premises. This can be useful for applications such as: exposing S3 bucket data (or other AWS cloud-native storage options) on-premises.
Different types of AWS Storage Gateways include:
1) S3 File Gateway
2) FSx File Gateway
3) Volume Gateway
4) Tape Gateway
Describe the Amazon S3 File Gateway service.
Amazon S3 File Gateway is a service which is used to make S3 bucket objects accessible using the SMB and NFS protocols. Files/objects accessed recently will be cached in the gateway application, for subsequent rapid access.
Amazon S3 File Gateway supports storing/retrieving files in any of the S3 storage tiers except Glacier, although files can be transitioned into S3 Glacier using a Lifecycle Policy.
Additionally, S3 Gateways using the SMB protocol can be integrated with Active Directory (AD) for user authentication.