Adrian Cantrill Cards Flashcards
CLB VS ALB
- ALB can route based on Layer 7
- can handle multiple domains and can understand URL path
- can handle multiple SSL Certs so permits consolidation onto a single ALB
- CLB cannot understand Layer 7!
- ALB Supports ECS, Lambda, etc.
When to use site-to-site VPN
- Managed Service
- HA by design
- Connect non-AWS to AWS
- Quick to set up and secure
Reserved instance billing
- Used for constant steady-state usage
- You exchange flexibility for discounts
- Generally 1- or 3 year commitment, with options for 0 upfront or partial upfront, or full upfront.
Lambda key facts
- Billed for execution time: max 15 mins.
- Cold start (running environment created from new)
- Warm start (environment reused)
- Execution policy = IAM role providing permissions.
Service control policies
Account permissions boundaries. Limit what all users in an account can do, even the root user. Do not apply to the master account.
Relational databases
- RDS Oracle
- RDS MySQL
- RDS MariaDB
- RDS PostgreDB
- RDS Aruora
- RedShift (column)
VPC
- Isolated network
- network blast isolation
- One or more IPv4 CIDRs (/28->/16)
- Can have IPv6 allocated
- Region resilient
CloudHSM
- Uses industry standard API
- Same architecture as KMS (CMK/DEK)
- FIPS 140-2 level 3
- Exclusive control
CLB end-to-end encryption
- For an unbroken end-to-end encryption connection, pick a TCP listener so that the LB won’t decrypt the connection. The CLB doesn’t need any SSL cert installed on it.
7-Layer ISO model
- Layer 7: Application
- Layer 6: Presentation
- Layer 5: Session
- Layer 4: Transport
- Layer 3: Network
- Layer 2: Data Link
- Later 1: Physical
Please Do Not Throw Salty Peanut Away
When not to use Site-to-Site VPN
- Low latency
- Consistent latency
- High Speed
- Non-internet transit
CloudFront geo restriction
- White list/Black list
- Location only (country code)
- Cannot use any other field/aspect of customer sessions
What is an edge location?
- A smaller infrastructure unit. Edge locations are capable of running limited edge computing and are generally used by CloudFront for content distribution. They are located as close to major population center as possible.
What is an AZ
- An availability zone (an isolated unit of AWS infrastructure).
- A region can have one or more AZs.
- One failing AZ should be isolated from others.
- AZs might be 1 building or more.
- AZs can have many isolated units of compute, storage, and networking.
Internet gateway
- Associated with one VPC and a VPC can have one IGW
- Translate Private IP to and from Public/EIP
- Needs an RT (Route Table) route
- Highly available by design across all AZs used for public internet access IPv4/6
S3 transfer acceleration
- Provides new endpoint (via CloudFront).
- ````````````````````````````````````````
S3 transfer acceleration
- Provides new endpoint
- Uses the AWS global network for transit
- Entry point is a local CF (CloudFront) location, backhauled to bucket location
- MUCH faster than using S3 directly
EMR types
- Master Node (can only have one in a EMR cluster)
- Core Nodes
- Task Nodes
CLB vs ALB
- ALB can route based on Layer 7
- ALB can handle multiple domains and can understand URL paths
- ALB can handle multiple SSL Certs, so permits consolidation onto a single ALB
- CLB cannot understand Layer 7
- ALB supports ECS, Lambda, etc.
Persistent Data
- Data that exists beyond the lifetime of the thing it’s attached to. An EBS volume continues operating after a machine is shut down, restarted, or terminated (if that option is selected)
Cross-zone load balancing
- A setting that is default on ALB and optional on CLB. Allows an ELB node to distribute connections to instances/targets outside its AZ for a more even distribution of connections across AZs.
X-Forwarded-For
The ‘X-Forwarded-For’ request header helps you identify the IP address of a client when you use an HTTP or HTTPs load balancer. It adds the source IP of the original front-end viewer (The originating IP address of a client connecting to a web server through an HTTP proxy or a load balancer).
CloudFormation stack
- Created from a template. Maps logical resources in a template to physical resources in AWS. The lifecycle of a stack is linked to resources. Creations, updates, and deletions to the stack do the same to physical resources.
IAM Group
- Not a principle; cannot be referenced in policies
- Has IAM users as members
- Can have policies associated (inline or managed)
- cannot be “logged in to” - has no credentials.
AWS GraphDB service
- Neptune
CloudFormation template
- Used to create a stack
- Parameters, resources, mappings, conditions, and outputs
- Apply a template (YAML/JASON) to create one or more stacks
Proxy Protocol
- If TCP is used for the front and back end, the LB makes a connection to your instance. The ELB’s IP will show as the source. Proxy Protocol includes an additional field with the original source IP address. (TCP only)
What can you control with CloudFront behaviour?
- Protocol policy
- Path patterns
- Http methods
- Query String forwarding
- Cookie forwarding
- Lambda function association
- Object caching
- HTTP caching
- Request header caching
- Object compression
- TTL
- Viewer access restrictions
You want to ensure that you capture authentication activities on your account in CloudTrail. These are not API calls. How can you do this?
- Enable Management events (default) when configuring a Trail.
Note: The CloudTrail Event history feature supports only management events. Not all management events are supported in event history.
What are the global and regional characteristics of S3?
- S3 is a global service with region specific presence. Buckets are globally available and unique. But objects live in a particular region. Each account has a limit of 100 buckets, but unlimited prefixes.
What AWS services provides data that can be useful through Athena?
- Athena can access logs from CloudTrail, CloudFront, all Load Balancers, and Amazon VPC Flow Logs.
You want to query a cats table based on appointment type. The partition key is catID and the sort key is cat name. How can you efficiently query based on another column?
- To speed up queries on non-key attributes, you can create a global secondary index. A global secondary index contains a selection of attributes from the base table, but they are organised by a primary key that is different from that of the table. The index key does not need to have any of the key attributes from the table. It doesn’t even need to have the same key schema as a table.
How can you restore a database after deleting the Master?
- Database backups enable you to restore a database even after the Master is gone. However, they have a default retention period of 35 days, after which they will be deleted. (Sean - this is referring to AWS auto-backup).
How can you restrict access to S3 objects within a date range?
- Use S3 ACLs.
What purchasing options can you use to pay for RedShift compute nodes?
- On-demand or
- Reserved instances
Note: spot instances is not available for RedShift compute nodes
How do EC2 instances, on-premise VMs and servers become manageable by SSM?
- By installing Systems Manager Agent, applying appropriate IAM permissions (EC2 only) and activations for on-premise servers.
What are the four invocations for a Lambda@Edge function?
- Viewer request
- Viewer response
- Origin request
- Origin response
You want to keep CloudTrail events longer than 90-day retention period. What can you do?
- Configure Trails to deliver CloudTrail events to S3. And optionally enable file encryption and validation to prevent files from tampering.
Note: by default, log files are stored indefinitely
What are the two sides of a CloudFront edge called?
- The Origin, origin protocol, and origin fetch are where the cached content originate.
- The Viewer and Viewer Protocol are the client side or the edge.
Yu create a DynamoDB table with a partition key of CatID. Your CatIDs range from 0001 to 0004. You end up with lots of records for each cat, exceeding 10GB per cat. You notice that performance for reading and writing starts to slow down. What could be the problem?
Dynamo has had to create more partitions for each CatID. But since the partition key is CatID, Dynamo still limits the RCU/WCUs to 3000/1000 across all partitions for that CatID. You need to consider a partition key that has more values than 0001-0004.
You’ve configured VPC peering between VPC A and VPC B. When you try to ping an EC2 instance in B from A, you don’t get a response. What would you check?
- Check the default route table for the subnet in A where the pinging instance lives. It needs to have a route to the CIDR range of VPC B, configured as a Peering Connection with the ID of the peering connection. The same is true of VPC B, which needs a default route table entry for VPC A.
- Check NACL for both VPCs to ensure there are no inbound or outbound restrictions.
- Lastly, check the security groups for both sides of the communication to ensure sufficient allowances for inbound and outbound traffic between the two. (Me: such as ICMP protocol is allowed)
How does AWS IoT enable you to communicate with devices or systems from other manufacturers?
- AWS uses MQTT, which is an industry standard protocol that other devices and systems likely use.
Me:
What is MQTT stands for MQ Telemetry Transport. It is a publish/subscribe, extremely simple and lightweight messaging protocol, designed for constrained devices and low-bandwidth, high-latency or unreliable networks.
What service can you use when your data security requirements need FIPS 140-2?
- KMS or CloudHSM comply with FIPS 140-2
Note:
KMS provides only AWS API access. It does not provide industry standard API access.
(CloudHSM provides exclusive control)
What is separation of roles in KMS?
Administration of KMS, such as key management, may not have permissions to decrypt keys or data. Users of KMS may not have admin privileges, but can ask KMS to encrypt and decrypt data.
What is a CloudFormation Custom Resource?
A customer resource is a resource either in AWS or 3rd party that a CloudFormation template asks for as part of the stack. It uses messaging: SNS or Lambda, to trigger that resource to create/update/delete
Me: a custom resource is a resource that is not available as AWS CloudFormation resource types. Custom resource enable you to write custom provisioning logic in template logic in the template that AWS CF runs anytime you create, update or delete stacks.
When using S3 transfer acceleration, what is the endpoint that the client uses for a bucket?
- CloudFront local endpoint. Then the upload traverses the AWS backbone
What is CORS?
Cross origin resource sharing. This is a security measure that a server can use to control what other servers can access resources on it.
When using S3 transfer acceleration, what is the end point that the client uses for a bucket?
- A CloudFront local endpoint. Then the upload traverses the AWS global network.
How can a resource in a private subnet access Amazon S3?
By creating an private access point, in this case a private access point gateway. This was my answer. Need to revisit. It is correct, but not complete
What kind of transformation can you do to a Firehose stream?
Use a Lambda function to manipulate records or change the format to Parquet or ORC with a checkbox
Me: when you enable Kinesis Data Firehose data transformation, Kinesis Data Firehose buffers incoming data up to 3 MB by default (the size can be adjusted via API). Firehose then invokes the specifies Lambda function asynchronously with each buffered batch using the AWS Lambda to Kinesis Data Firehose.
Apache ORC: the smallest, fastest columnar storage for Hadoop workloads
Apache Parquet is a columnar storage format for Hadoop workloads
When configuring instance types for an ElasticSearch cluster, what choices should you consider?
The master node has much lower CPU and memory requirements, so it can be smaller. The data nodes to do the work and should be bigger
How does RedShift ensure durability of your data in the cluster
Each of the slices on your compute nodes have the advertised amount of data available to them according to the Management Console. But Redshift also reserves the same amount for replication of slice from other nodes, similar to RAID 5 with a disk array.
What’s a Logical Resource in CloudFormation?
The name the template uses to describe physical resources that CF creates. The physical resources acquire cryptic IDs only at the time of creation. The logical resource makes it easier to identify those resources.
What are some of the constraints of a DX connection?
Each connection has a bandwidth limit. Not encrypted, must have BGP support for routing
Where is the cluster in an Aurora Serverless dB?
It’s in the VPC you specify when creating the cluster. Aurora Serverless allocates ACUs into your cloister from a warm pool it maintains for all customers. The proxy manages connection from your applications to the cluster and movement of the ACUs in and out of your VPC. The proxy also manages migrating cache data from one ACU to another when capacity changes. Keep in mind, Serverless uses the same Cluster Storage Volumetier that Aurora uses, so Serverless is only managing the compute tier since the storage tier is already Serverless and multi-tenant.
What are the various node types of an EMR cluster?
An EMR cluster can have only one Master node. It also can have Core Nodes that run HDFS and manage tasks. And optionally Task Nodes which executes tasks but have no storage system. You cannot change the Master node instance type after creating the cluster. You can change Core and Task node instances types.
Since RDS doesn’t allow root access to the console, how can admins manage the configuration of the database?
Parameter groups and option groups provide the parameters admins need.
What can you do to improve the performance of a Lambda function?
Declare resource objects as singletons outside of the lambda_handler() (me: it like a global variable in c to avoid instantiation for each Lambda function instance). They may be available to the next instance of the function, but declare them as NULL and do NULL checks in the handler or sub-functions.
Me: state dehydration and cold start up library loading slows Lambda function down. So Lambda function will have a better performance when it is stateless and a programming language that loads fast.
What are the two types of groups EMR can use for managing instances?
Uniform instance Groups and Fleet instance Groups
How can you introduce new versions of an API without breaking applications that depend on the current version?
Use stages (me: see Lambda function versioning as an example)
How can you configure ElasticSearch for the best HA?
Configure three master nodes: the master and two eligible masters. Also have at least three data nodes across AZs. By having three of each, ES can use a quorum to determine the next election if one fails (me: assuming that only one AZ will fail in a given time).
What are the three minimum parts to an IAM policy statement?
Effect, Action, Resource. And additionally, Condition
What’s the fastest way to recover to a previous point of an Aurora DB?
Backtrack instead of restore from a backup.
What is the main network security difference between ALB and NLB
ALBs terminate the client request then establish a new connection to the target group because encryption operates at layer 7. NLBs examining only the layer four header, simply pass on the layer 5+ portion of the packet which may or may not be encrypted. This enables an end-to-end pass-through of encrypted layer 7, improving performance (me: and allow for end-to-end encryption (better data protection)
You are configuring a Classic Load Balancer but need to allow a few different HTTP response codes for success. How can you do that?
Use an ALB and enumerate the various HTTP response codes for the health check. CLB doesn’t support Layer 7 where HTTP resides.
What are the goals of CloudFormation template portability?
Design it to run in any account or location without modifications or user input.
You are design a ticket sales system using DynamoDB as the order database. You want to ensure that you always have the capacity you need no matter how many customers arrive at your site. What’s the best way to ensure that capacity on Dynamo?
On-demand capacity will always manage the RCU/WCU and storage requirements for the incoming load. Auto scaling sets and upper limits and can be slow. Provisioned capacity also sets a limit.
You are using DynamoDB for ticket sales. When tickets for a particular event go on sale, you want to ensure that when a customer selects a ticket for purchase, you hold that ticket until they complete the purchase. What’s the simplest way to do that?
Define the updates to the ticket availability and the purchase steps as a transaction. That way, DynamoDB commits all of the updates as a single atomic unit.
me: each transaction can include up to 10 unique items or up to 4MB of data, including conditions.
TransactWriteItems
TransactReadItems
Three read options:
- eventual consistency
- strong consistency
- transactional
Two for write:
- standard
- transactional.
Transactions are enabled for all single-region DynamoDB tables and are disabled on global tables by default.
Items are not locked during a transaction. DynamoDB transactions provide serialisable isolation. If an item is modified outside of a transaction while the transaction is in process, the transaction is canceled and an exception is thrown with details about which item or items cause the exception.
You can connect to VPC from using VPC peering. You can also connect to VPC from VPC A. But VPC B can’t connect to VPC C. What’s likely the problem?
me: The question is not properly written
Answer from LinuxAcademy:
VPC peering is not transitive. VPC B need to have a VPC peering connection to VPC C.
What is the key qualities of a Serverless architecture?
- Event driven
- Capable of scaling from very low capacity to very high capacity
- Only pay for what the demand requires.
me: it question implies Lambda Serverless, not e.g. DynamoDB on-demand. DynamoDB is a Serverless database service.
When should you use on-demand?
When the application positively needs the capacity and the demand is not steady, e.g. sparky.
How many security groups can a subnet have?
Subnets don’t have security groups. Security groups are assigned to network interfaces.
You need to implement a solution for [social networking, knowledge graphs, fraud detection, recommendations]. What database solution should you choose?
Neptune is a graph DB that is highly scalable with 15 read replicas across three AZs in a region. It can hold up to 64 TB of data with encryption at rest.
What are the 4 invocation methods for a Lambda@edge function?
Viewer request and viewer response
Origin request and origin response
You create a database credential using a secure string in Parameter Store. The client application needs to use this string to connect to a database. What permissions does the application needs?
Access to Parameter Store and access to master key (CMK) used to encrypt the credentials in Parameter Store.
You have a web application has UDP endpoints. What kind of ELB should you use?
Classic LB or ALB. NLB does not support UDP, only TCP.
What cryptography standards does Cloud HSM support?
PKCS#11, java cryptography extensions (JCE), Microsoft cryptoNG (CNG). These are APIs. It does not provide any AWS APIs, meaning that it cannot integrate with other AWS Services offering encryption.
CloudHSM also supports FIPS 142-2 at level 3 which is higher than KMS. (Me: FIPS 142-2 at level 2)
What are the three things you can do with a CloudFormation update Stack operation?
Create, Update or Delete resources
What are some ways to minimise costs with Athena?
Since customers pay for Athena by the amount of data it scans, reduce the amount of data scanned by partition data, organise it into columnar format to reduce the number of columns Athena sees during a query.
What data format does Athena support?
XML, JSON, CSV, TSV, Apache Avro, Apache ORC and Apache Parquet
me:
Avro is a row-oriented remote procedure call and data serialisation framework developed within Apache’s Hadoop project. It uses JASON for defining data types and protocols, and serialises data in a compact binary format.
What do VPC flow logs record?
What don’t they record?
- Metadata about network traffic into and out of the VPC, including address, port, protocol and more. Logs do not include information about DHCP, AWS DNS, or license activation requests. This is not a network monitor.
What are the various endpoints in Aurora?
The cluster endpoint refers to the master for read and write. The reader endpoint refers to all replicas as a cluster. Readers get directed to any replica, improving scalability of Aurora for read-intensive workload. Instance endpoints refer to any specific instance. Custom endpoints allows you to configure groups of instances behind an endpoint.
me: Amazon Aurora typically involves a cluster of DB Instances instead of a single instance. Each connection is handled by a specific DB instance. When you connect to an Aurora cluster, the host name and port that you specify point to an intermediary handler called an endpoint. Aurora uses endpoint mechanism to abstract these connections. Thus, you don’t have to hardcore all the host names or write your own logic for load balancing and rerouting connections when some DB Instances aren’t available.
For certain Aurora tasks, different Instances or groups of Instances perform different roles. For example, the primary instance handles all the data definition language (DDL) and data manipulation language (DML)statements. Up to 15 Aurora Replicas handle read-only query traffic.
What is a REST API?
They are uni-directional request/response calling patterns that use HTTP semantics with query string arguments. Behind the API, a service performs a task and returns the result to the caller.
What are the four key terms for ECS?
- Cluster — the collection of resources ECS can use to run your containers
- Service — ECS or Fargate runtime responsible for managing container tasks
- Task definition — a configuration file that tells ECS what containers it should create and how they interact with the outside world;
- Container definition —
What are the five-step approach to answering an exam question?
- Identify significant points in the question.
- Identify similar answers, but understand the differences.
- Look for disqualifying facts in answers, based on point 1.
- Eliminate any generally bad answers.
- Pick between remaining answers using judgement.
You want to create a CloudTrail that captures global events and performs the same behaviour in all regions without the risk of duplicate trails. You want to be sure that all of the accounts in your organisation do the same. What’s the easiest way to do this?
Configure a Trail with ‘Apply to all regions’ selected and ‘Apply to my Organisation’.
What are the three states a CloudWatch alarm can have?
OK, Alarm, Insufficient Data
How does S3 maintain versions of objects
Only if versioning is enabled, S3 creates new versions with the same name (me: same name but with unique IDs, each and every object stored in S3 have globally unique object ID regardless versioned or not). Objects have object IDs that are unique. Versions can live in different storage tiers in S3 using lifecycle policies. Delete actions mark objects for deletion.
me: when versioning is enabled, a simple DELETE cannot permanently delete an object. Instead, Amazon S3 inserts a delete marker in the bucket, and that maker becomes the current version of the object with a new ID. When you try to GET an object whose current version is a delete marker, Amazon S3 behaves as though the object has been deleted ( event though it has not been erased) and return a 404 error.
To permanently delete versioned objects, you must use DELETE ObjectVersionId.
What are some of the constraints of a DX connection?
Formerly, Public VIFs were limited to a region
Formerly, Private VIFs are attached to a VPG which is associated with a VPC
Now using BGP, the Public VIF advertises all public zone AWS service endpoints in all regions.
*Public zone endpoints do not require or event allow a DX to a access the Internet
Now, using DX gateway, customers can use a single Private VIF to a DXGW, then connect to any VPGW in any region; DX GW uses BGP to advertise the networks it can access back to the VIF, reducing admin overhead.
*These Private VIFs are not transitive.
What service can you use when your data security requirements need FIPS 142-2?
KMS or CloudHSM comply with FIPS 142-2.
Note: KMS provides only AWS API access. It does not provide industry standard API access. On the contrary, CloudHSM provides only the industry standard API PKCS#11, JCE (Java cryptography extension) and CNG (Microsoft cryptoNG) and it is not integrated with AWS services to prove crypto services like KMS does.
Describe the seven different instance type and what each best supports.
DR Mc GIFT PX
Dr. Mc Gift PX
DR Mc FIGHT PX
You want to be able to have multiple applications receive messages from a queue. You also want to late applications to replay messages from an earlier time. Your team wants to use SUS. What do you suggest?
SQS does not support multiple consumers of messages on a queue nor does it support replay. You should recommend kinesics data streams for this requirement.
Your Kinesis Data Analytics (SQL) application requires input data to contain certain fields. What can you do to solve that?
Use a Lambda function pre-processor on the KDA application. The Lambda function can inspect each message and any missing field with default values.
Where do Lambda functions live in the AWS network?
me: AWS Public zone or in your VPC.
By default, they live in a region but outside the customer’s VPC. It thus has access to the internet. Customers can also configure VPC Lambda functions where all the networking restrictions do the subnet and security group apply. The VPC Lambda function still runs in a sandbox outside of the VPC and exposes itself through an ENI in the VPC. For this reason, cold starts are even slower.
New: RemoteNAT enables multiple VPC Lambda functions to share the same network interface in a VPC, thus speeding up the start time.
What are some of the design goals that a memory cache like ElastiChache can provide?
Reduce latency for database lookups, improved availability if the cache is HA, stateless micro service design support by offloading state to a cache, and reduced costs/impact on databases by offloading cache-able reads to the memory cache.
What are some of the CloudFormation Update behaviours for: EC2, RDS, AutoScaling Group, EBS?
- Update with no interruption
AWS CF updates the resource without disrupting operation of that resource and without changing the resource’s physical ID. for example, if you update any property on an AWS::CloudTrail::Trail resource, AWS CloudFormation updates the trail without disruption. - Update with some interruption
AWS CF updates the resource with some interruption and retains the Physical ID. for example, if you update certain properties on an AWS::EC2::Instance resource, the instance might have some interruption while AWS CloudFormation and AWS CF and Amazon EC2 reconfigure the instance. - Replacement
AWS CF recreates the resource during an update, which also generates a new physical ID. AWS CF creates the replacement resource first, changes references from other dependent resources to point to the replacement resource, and then deletes the old resource. For example, if you update Engine property of an AWS::RDS::DBInstance resource type, AWS CouldFormation creates a new resource and replaces the current DB instance resource with the new one.
AutoScaling Group: no interruption
you can use the AutoScalingRollingUpdate policy to control how AWS CF handles rolling updates or an Auto Scaling group. This common approach keeps the same Auto Scaling group, and then replaces the old instances based on the parameters that you set.
EBS: change instance type - replacement
Change size, interruption
EC2:
Resizing - interruption
Change type - replacement
Moving to a different AZ, replacement
What is the role f a standby in Aurora DB?
There is no standby in Aurora. Replicas, up to 15 per region, serve as promotable to master instances.
What is envelope encryption?
KMS facilitates progressive encryption: a CMK encrypted a data encryption key (DEK) which encrypts data. The encrypted DEK then used to decrypt data (after it is decrypted by using the CMK).
What’s the ideal scope of a Lambda function?`
A small function that does something narrow and well. Accept an input, and produce an output.
me: just like an RESTful function in a micro-service architecture - do one thing, and do it well.
Someone deleted a role to access your RDS database. What tool should you use to see who did that?
CloudTrail events log all API activities in your account. The user who deleted an IAM role will appear there as an event.
What are some options for HA using Direct Connect?
Lower cost option: One DX, one VPN over internet
Better performance option: two DX — separate customer routers, separate links to DX location, separate DX location routers (me: have a physically separated redundant DX link)
You’re designing a voting system and recording the votes in DynamoDB. You expect certain locales to be much busier than others. But you need to use to a partition key of postal code to organise you data. What can you do to manage hot spots in your partitions?
DynamoDB enables Adaptive Capacity by default. This allocates RCU/WCU dynamically across partitions while still staying below the total hard limit of the table.
Me: Instant Adaptive Capacity (new 2019)
- Scalability and performance even for imbalanced workload:
* Dynamic partitioning for storage and throughput
* Automatic isolation of frequently accessed items
* Automatic boosting if table is consuming less than provisioned.
Note: max capacity (high water mark) will be automatically increased (on-demand model)
What are the seven states of a Step Function state machine?
Task, pass, choice, fail, succeed, parallel, wait.
What is a web socket API?
A persistent session-oriented protocol/API best suited for data streaming, interactive applications, event management applications.
What makes a subnet public?
The VPC must have an IGW, and the subnet must have a default route to the IGW by associating with a VPC route table that has a route to the IGW.
How can CloudTrail events integrate with CloudWatch?
Configure a Trail to stream to a CloudWatch Log Group
How can CloudTrail events integrate with CloudWatch?
Configure a Trail to stream to a CloudWatch Log group
What resources can an account share to other accounts using Resource Groups?
Subnets, Transit Gateways, resolver rules, license configurations.
Accounts cannot share subnets inside of the default VPC, nor use subnets that are owned by the owner of the resource. Likewise, a share cannot share security groups not owned by the resource owner. The resource owner can remove sharing from resources in use by others and those shares will continue until released.
What are the appropriate instance types for various EMR scenarios?
Long-Running Clusters and Data Warehouses On-Demand: On-Demand or instance-fleet mix Spot or instance-fleet mix
Cost-Driven workloads: Spot Spot Spot
Data-Critical workloads On-Demand: On-Demand Spot or instance fleet mix
Application Testing: Spot Spot Spot
What’s the best way to handle the base load of an application?
Use reserved instances in the necessary AZs
How is this snippet from a CloudFormation template not reusable?
“BucketName”: “lapix12345”
It’s specifying a value that needs to be globally unique. Thus, any use of the template after the first run will not be able to create the bucket since it will be a duplicate name.
What evaluations does IAM consider when determining a principle’s effective permissions?
Organisational boundaries->user/role boundaries->role policies->effective permissions
What are the four key terms for ECS?
Cluster - the collection of resources ECS can use to run your containers
Service - ECS or Fargate runtime responsible for managing container task
Task definition - a configuration file that tells ECS what containers it should create and how they interact with the outside world;
Container definition - a configuration file which is used in a task definition to describe the different containers that are launched as part of a task
You want to revert to a previous configuration of an environment in Elastic Beanstalk. What’s the best way to do that?
Save all your configurations. Then, you can revert to a saved configuration.
How can CloudTrail events integrated with CloudWatch?
By configuring a Trail to stream to a CloudWatch Log Group
How can you create a private CDN distribution using CloudFront?
Use a Behaviour that restricts viewer access based on signed URLs or signed cookies.
What’s the most reliable way to delete physical resources from a CloudFormation stack?
Use CF to delete the stack completely. It uses the Logical/Physical resource mapping to track all resources it created so that it can delete them using the template.
How can you avoid message fees incurred by using IoT Topics?
Publish message to $aws/rules/rule name which sends the message directly to an IoT Rule without the pub/sub features of IoT Topics.
Due to CCPA guidelines, you want to ensure that IAM Sales_Managers don’t see customers’ name and address information except zip code and state in your DynamoDB database. What can you do to restrict viewing of these attributes in the Customers table?
Create an IAM policy that Allows all the actions on the Dynamo table that the Sales_Managers (and others who adopt this policy) need. Use a Condition that uses a variable that looks for the ID of the SalesRep_ID requested in the query. Only allows access to the records associated with this Sales_Manager.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "dynamodb:GetItem", "dynamodb:BatchGetItem", "dynamodb:Query", "dynamodb:PutItem", "dynamodb:UpdateItem", "dynamodb:DeleteItem", "dynamodb:BatchWriteItem" ], "Resource": ["arn:aws:dynamodb:*:*:table/table-name"], "Condition": { "ForAllValues:StringEquals": { "dynamodb:Attributes": [ "column-name-1", "column-name-2", "column-name-3" ] }, "StringEqualsIfExists": {"dynamodb:Select": "SPECIFIC_ATTRIBUTES"} } } ] }
How does DynamoDB implement Global Tables?
It uses Streams to capture all changes to one table to the other tables in other regions. You configure a Global Table by enabling Streams (Old and New), then adding Regions where you want to replicas. All tables are masters and stream their changes to the other replicas.
What types of identity federation can you employ?
Web Identity Federation, SAML and cross account trust
If your application requires very fast failover in the event of an AWS AZ failure, which Aurora model is best?
Aurora provisioned with read replica instances will promote to Master within 1 minute. Aurora serverless needs to create ACU, Proxy and connections in the new AZ, which will take longer.
ACU: Aurora Capacity Unit
Aurora Serverless and Failover:
If the DB instance for an Aurora Serverless DB cluster becomes unavailable or the Availability Zone (AZ) it is in fails, Aurora recreates the DB instance in a different AZ. We refer to this capability as automatic multi-AZ failover.
This failover mechanism takes longer than for an Aurora Provisioned cluster. The Aurora Serverless failover time is currently undefined because it depends on demand and capacity availability in other AZs within the given AWS Region.
Me: when you work with Aurora without Aurora Serverless (provisioned DB clusters), you can choose your DB instance class size and create Aurora Replicas.This model works well with when the database workload is predictable, because you can adjust capacity manually based on the expected workload.
With Aurora Serverless, you can create a database endpoint without specify the DB instance class size. You set the minimum and maximum capacity. With Aurora Serverless, the database endpoint connects to a proxy fleet that routes the workload to a fleet of resources that are automatically scaled. Because of the proxy fleet. Aurora Serverless manages the connections automatically. Scaling is rapid because it uses a pool of “warm” resources that are always ready to service requests. Storage and processing are seperate, so you can scale down to zero processing and pay only for storage.
Aurora Serverless introduces a new serverless DB engine mode for Aurora DB clusters. Non-Serverless DB cluster use the provisioned DB engine mode.
How can a CloudFormation template allow for a range of values for a resource option that users need to choose when launching the template?
Reference a resource option. Elsewhere in the script, enumerate all the valid values for that option. Optionally, provide a default so that the user does not have to make a choice.
Your customer needs to transfer file archives from their data centre to AWS for Amazon Glacier. You find out that the total file size adds up to 7TB. They want the files in place on AWS in the next three weeks. What transfer method will you recommend?
- Snowball >= 10TB
- SnowEdge >= 10 TB with Edge comput
- Snowmobile >= 100TB, arrange.
File Gateway. Snowball becomes economical only from 10TB or higher and when time is short. In this case, the customer has time and the amount of data is less than 10TB.
Me: this recommendation assumes that the link bandwidth is big enough and the file transfer does not impact all other activities on the link.
You want to query a cats table based on appointment type. The partition key is catID and the sort key is cat name. How can you efficiently query based on another column?
To speed up queries on non-key attributes, you can create a global secondary index. A global secondary index contains a selection of attributes from the base table, but they are organised by a primary key that is different from that of the table. The index key does not need to have any of the key attributes from the table. It doesn’t even need to have the same key schema as a table. (me: The global secondary index creates a new table that is hidden)
When should a customer use a VPN rather than DX?
- When speed of setup is critical; VPN takes minutes, DX takes days - weeks
- Cost can be lower for spiky or sporadic usage
- When network QoS is not critical. VPN performance depends on customer router CPU due to encryption; the ISP network connection over the internet is not consistent.
How does DynamoDB implement Global Tables?
It uses Stream to capture all changes to one table to the other tables in other regions. You configure a Global Table by enabling Streams (Old and New), then by adding Regions where you want replicas. All tables are masters and stream their changes to the other replicas.
What is in a launch configuration?
(Me: launch configuration is immutable after saving)
- AMI,
- instance type,
- purchase options: on-demand, RI, sport,
- IP addressing,
- user data,
- CW detailed monitoring option,
- amount and type of storage,
- key pair.
To make changes after saving, you must create a completely new launch config, then change the LC association in the AS group.
How can a resource in a private subnet access Amazon S3?
Configure a VPC Endpoint (Gateway type). Configure the endpoint to update the route table for the private subnet needing access. This avoid NAT gateways or egress-only IGWs, leaving the private subnet private.
What sizing options does Aurora offer?
Provisioned, parallel query, or Serverless
What does an Athena data catalog table include?
The table is a set of columns that have the name and data type of the fields in the data files you want to query. It also contains a pointer to the S3 location where these data files sit.
Your customer wants to get out of managing their messaging infrastructure. They use a number of standard APIs and protocols with their applications. They require that all communications with the messaging system be private, not over internet. They also want a highly available solution. What will you recommend?
Amazon MQ offers support for a number of standard protocols and is an open source project, so the API is likely familiar to the customer. Unlike SQS and SNS, Amazon MQ is deployed with private endpoints in the customer’s VPC with no public access required for applications within the VPC. It’s also available in an active/standby configuration across multiple AZs in a region.
(Me: MQ is not fully integrated with AWS eco-system such as monitoring using CloudWatch, CloudTrails, etc. control plane or the data plane).
What are the four EBS storage types and what are they best suite to do?
- General Purpose gp2: SSD
Default for most workloads. Burst to 3,000 IOPS with credits 1GB - 16TB. - Provisioned IOPS SSD io1: SSD
Mission critical, sustained IOPS
Suited for Large database 4GB - 16TB
Provisioned IOPS to 64,000
- Throughput Optimised st1: HDD Low cost Frequently accessed data, streams, media; not boot volume 500GB - 16TB 500 IOPS
- Cold HDD sc1: HDD Low cost Infrequent access Not boot volume 500 GB - 16 TB 250 IOPS
What does a container contain?
Applications and the required library versions
What’s the difference between tasks launched in EC2 vs. Fargate mode?
With EC2, you define the instance type and are responsible for the cluster. With Fargate, you don’t. You define tasks and let ECS/Fargate obtain the container hosts to run those tasks.
How can you configure a custom origin for CloudFront distribution over a private WAN connection?
You can’t. Any origin server must offer public access to CloudFront.
What does AWS Shield do beyond WAF?
Shield is a DDoS protection layer in front of WAF. Standard is free and always on (me: network layer DDoS protection). Advanced provides WAF, DDoS mitigation’s, visibility and reporting, DDoS response team support, and cost protection due to attacks. Shield also protects EIPs.
Where does your Redshift cluster live in the AWS network?
In a VPC you specify. By default, it uses the default VPC, but you can specify a customer VPC in your account. To access S3 data, you need to configuration NAT or internet gateway for the public S3 endpoint, or create a VPC endpoint for S3.
What are some things you can do in a CloudFormation template to ensure portability?
- using default values in parameter lists (avoiding human input),
- use ParameterStore for system and customer values,
- Pseudo Parameters for CF-wide values such as region, partition, account ID, stack ID and more,
- Intrinsic Functions (for AZs in a region, more),
- Don’t specify the PhysicalID or a resource (and CF will create a unique one for you.
How does HVM improve performance of virtualised machines?
HVM uses newer generation of CPUs that allows guest OSs to interact with CPU, memory, network, local storage and the motherboard bypassing the hypervisor.
Unlike paravirtualisation, HVM avoids emulation, speeding up performance of guest OSs.
Next, AWS introduced Nitro in 2017, bringing hardware virtualisation to all aspects of the guest OS access to the hardware. This results in near bear metal performance.
What EC2 metrics can CloudWatch not see by default
Memory, file system, applications. For these, customers need to install the CloudWatch Agent.
What are three common modes of creating an EMR cluster?
Long running - create, continue to run jobs, queries , host databases
Interactive - create, then log on using SSH and work from the console
Transient - create, run a job, terminate.
What is CORS?
Cross origin resource sharing. This is a security measure that a server can use to control what other servers can access resources on it (me: using http)
What are some of the features that Aurora serverless offers microservice/serverless applications that make it an attractive option as an RDBMS?
Very low latency connections and REST APIs for queries. Also it doesn’t involve cluster infrastructure running a customer’s environment.
How does EMR deploy cluster nodes in a VPC?
All nodes are in a single AZ, subnet and security group for minimum latency (me: and data movement cost - cross AZ data traffic will be charged at a price)
Where can you manage access and other configurations such as retention period and event filters, for CloudWatch Logs?
At the Log Group level which is a group of related Log Streams
How can you deliver /var/log log file events to CloudWatch Logs?
Install CloudWatch Agent on the EC2 instance. Configure it to tail the logs in /var/logs. It will pick up each entry in the log file and send them as Log events using a Log Stream.
When using OpsWorks, where does your application code live?
It lives outside of OpsWorks in a repository you specify, such as Git.
You provide the URL and the credentials to OpsWorks in the Apps recipe, you specify the deployment targets using a deployment recipe.
What is a REST API?
They are uni-directional request/response calling patterns that use
HTTP semantics with query string arguments. Behind the API, a service performs a task and returns the result to the caller.
How can applications access Aurora Serverless DB w/o connection strings?
Enable the Data API to expose the REST APIs of the database through the Proxy layer.
You’re designing an application that needs messages from an order entry from. The message must be in the exact order they occurred without duplicates. You expect the load to be around 6,000 messages per minute. What messaging option should you use?
SQS FIFO supports up to 300 messages per second with guaranteed ordering and no duplicates.
What network options does ElasticSearch offer?
ElasticSearch cluster run in a dedicated network not part of the customer’s account. For private VPC access, ElasticSearch will expose itself to the customer VPC using interface endpoints. The customer can assign a security group to the cluster. For public access, the cluster is accessible directly from the internet.
What are the various fault domains in AWS?
AZs, regions, global edge services
Your DocumentDB cluster is struggling to keep up with write demand.
What are your options for improving performance?
Scale UP the instance type(s) of the cluster to a larger EC2 type. Since there is only one write instance, the only option is scaling UP.
What is the role of a standby in Aurora DB
There is not standby in Aurora DB. up to 15 read replicas in a region, all are promotable to master instances.
What are the global and regional characteristics of S3?
S3 is a global service with region-specific presence. Buckets are globally globally available and unique. But objects live in a particular region. Each account has a limit of 100 buckets, but unlimited prefixes.
How can you protect your CloudFront distributions from malicious activity?
Configure WAF in front of the CF distribution
You select VPC1 for your ALB. You then select AZ1, AZ2, and AZ3. What does the ALB deploy?
The ALB Service is outside of the VPC across the regions. ALB will deploy an ALB node with the appropriate IP address in each AZ you’ve selected.
You’re not able to ping the host name of an instance in another VPC from a bastion in a different VPC. You have VPC peering configured and all routes, NACLs, security group and routes are configured correctly. What else could be the problem.
Make sure Requestor and Acceptor DNS resolution is configured for the peering connection. This will ensure that the requestor doesn’t need to go over the Internet for the ping.
Me: (AWS 2016)
You can now enable resolution of public DNS host name to private IP addresses when queried from the peering VPC. This functionality is also supported cross-account so the two VPCs can be in different accounts.
You can enable DNS resolution support for VPC peering using AWS Management Console, AWS CLI, through SDKs.
How can you restore a database after deleting the Master?
Database (auto) backups enable you to restore a database even after the Master is gone. However, they have a default retention period of 35 days, after which they will be deleted.
How does DocumentDB log its activities?
Using logs it exports to CloudWatch Logs using a service-linked role.
What is the max size of a DB that you can configure for an Aurora Serverless database?
You don’t configure capacity on Aurora Serverless. You configure Aurora Capacity Units (ACUs) which are 1 CPU and 2GB of RAM.
What is separation of roles in KMS?
Administrators of KMS such as key management may not have permissions to decrypt Keys or data. Users of KMS may not have admin privileges, but can ask KMS to encrypt or decrypt data.
How can you improve the chance of keep sport instances you need for a workload?
Spot fleet
How can you have reference data to join in a Kinesis Data Analytics (SQL)?
Put reference data in S3 then define a reference table that enables the SQL query to treat the lookup data as a table.
What can you use to monitor data across your accounts from a single master account?
(Me: the question is not clear)
Guard Duty uses ML to monitor a number of AWS data sources, such as VPC flow logs, R53, CloudTrail, threat intel, CloudWatch events, account activity.
It generates findings into the Guard Duty console. A trusted IP list excludes these IPs from Guard Duty scanning. Threat lists tell GD additionally what to watch across all accounts.
How does OpsWorks fit in to the range of tools like CloudFormation and Elastic Beanstalk?
OpsWorks offers most of the control over deployments that CloudFormation offers, but still provides minimal config options. It offers Chef or Puppet as the deployment frameworks.
How can you ensure you can get the capacity you need when you need it?
On demand capacity reservations. No up-front, but you get the capacity when you need it.
(Me: no pricing discount which is apply to account billing in regions, On demand capacity reservations is applied to AZs. Can be cancelled at any time)
Where does an EC2 instance reside?
In a region, an AZ, a subnet, behind a security group.
What platforms does Elastic Beanstalk support?
Docker, multi-container Docker, java, node.js, tomcat, python, ruby, .NET, Go, PHP, Docker Go, Docker GlassFish, Docker Python.
How do applications running on EC2 get their credentials to access other AWS services?
If the instance is running as an IAM role, the IAM role info is available in the instance meta data for the application to use
Instance profile - if you use console to create a role for EC2, the console automatically creates an instance profile and give the same name as the role;
If you manage via CLI, you create roles and instance profiles as seperate actions - you must know the names of your instance profile
Instance Metadata - data about your instance that you can use to configure or manage the running instance. Instance metadata is divided into categories, for example, host name, events, and security groups.
Although you can only access instance metadata and user data from within the instance itself, the data is not protected by authentication or cryptographic methods. Anyone who has direct access to the instance, and potentially any software running on the instance, can view its metadata. Therefore, you should not store sensitive data, such as passwords or long-lived encryption keys, as user data.
How does a Lambda function get permission to access AWS resources?
Assign it an execution role appropriate for what it needs to access.
You perform a DynamoDB query using the partition key and a filter. You notice that the charges for your queries haven’t gone down in spite of the filter returning very few items. What could be the problem.
Query filters do not reduce the amount of items that DynamoDB searches for the query results. Only sort keys do that.
What’s the easiest way to enable traffic between resources in a VPC?
Have them all use the same security group. That security group should have a rule that ALLOWS traffic from that same security group ID.
Protocol type, Protocol number, Ports, Source IP
All Traffic, All, All, ‘The Security Group ID’
When should you use Spot?
When the application can tolerate loss of an instance, such as big data systems, stateless web clusters, or dev test or experiments what time of finish is not critical.
What are the key qualities of a serverless architecture?
- Hardware platform or cluster and capacity scaling are fully managed,
- pay only what you use (compute and resources),
- suitable for unknown usage pattern or spiky workload (non-steady workloads).
- Event driven.
What are the feature differences between ELB types?
Application Load Balancer
- Feature rich, layer 7 load-balanced platform
- Content-based routing allows requests to be routed to different applications behind a single load balancer: path based routing or host-based routing
- support for micro-services (Lambda) and container based applications, including deep integration with Amazon Elastic Container Service (Amazon ECS)
It is a best practice that you upload SSL Certificates to ACM. If you’re using certificate algorithms and key sizes that aren’t currently supported by ACM or the associated AWS resources, then you can also upload an SSL certificate to IAM using CLI.
Application Load Balancer
- (HA) Automatically scales capacity to handle the number of incoming requests
- (Health checks) ALB allows the user specify a range of HTTP response codes that define instance health
- (Sticky session) ALB only supports the cookies generated by the load balancer
- (VPC support) Yes, but without EC2-classic
- (Dynamic Port Mapping) Yes, ALB supports dynamic port mapping using the EC2 Container Service
- (Supported protocol) HTTP, HTTPS, HTTP/2, WebSockets
- (CloudWatch metrics) Per port and path monitoring, Range HTTP response codes, Connection per hour, Overall traffic volume
- (Access Logs) ALB supports type of request (HTTP, HTTPS, HTTP/2, WebSockets), and the target Amazon Resource Name.
- (Backend Server AuthN) Supported by ALB
- (Deletion Protection) Supported by ALB
- (Path-Based Routing) Supported by ALB
Me: if you are building an API and wanted to leverage AuthN/Z, request validation, rate limiting, SDK generation, direct AWS service backend, use AWS API Gateway. If you want to add Lambda to an existing web app behind ALB you can now just add it to the needed route.
API gateway integrates with IAM natively, it has done all the heavy lifting for you.
ALB vs API Gateway:
https://serverless-training.com/articles/api-gateway-vs-application-load-balancer-technical-details/
How many slices do RedShift nodes have?
Two or sixteen for DC2 node type or four or sixteen for RA3 node type. The leader node distributes work to the slices. The Load or Copy commands get data from e.g. S3 and distribute it to the slices. Slices have dedicated storage and CPU capacity. When loading data, the leader node distributes it according to your distribution style configuration: all, even, key or auto.
You have a website that uses Adobe Flash. You want to improve performance by distributing that element from CloudFront. What kind of distribution should you use?
RTMP (Real Time Media Protocol) is only option that supports Adobe Flash.
When you allocates 5 RCUs to a table, how much data can you read?
4 KB/sec. That can be five read operations of 4K or less, one or more operations of more than 4K. Also, Dynamo caches up to 300 CUs so that they’re available for spikes.
What’s a regional cache in CloudFront?
It’s a region-based cache of the origin server content. This is the first place a CDN server checks on an origin fetch. If this misses, then the fetch goes to the origin.
If you configure a single node Redshift cluster, where is the leader node? If you configure a two-node cluster, where is it?
If only a single node, the leader and compute nodes are co-located on a single instance. Greater than one, Redshift creates a dedicated leader node free of charge. So the two-node example will include three instances.
What are the three behaviour options for updating a resource in CloudFormation?
Update with no interruption (e.g. updating a CloudTrail property)
Update with some interruption (e.g. change an instance type)
Replacement (e.g. change the engine type of an RDS database)
How can you grant access to public website users to page content hosted in S3 static web hosting?
Create a bucket policy that allows principal:* (me: anonymous access)
What is the difference between HTTP redirect and forward (ALB)
With forward, the ALB forwards the HTTP request and arguments to the destination you’ve configured in the ALB forwarding rule. (Me: server side redirect)
With redirect, the ALB returns a different URL to the client browser that the browser then needs to use to make the request again. This slightly slower than forwarding. (Me: client side redirect)
What services does Amazon Directory Service Include?
Simple AD, Microsoft Active Directory (AD), AD connector, Amazon Cloud Directory, and Amazon Cognito.
Amazon Cloud Directory enables you to build flexible cloud-native directories for organising hierarchies of data along multiple dimensions. While traditional directory solutions, such as Active Directory and other LDAP-based directories, limit you to a single hierarchy, Cloud Directory offers you the flexibility to create directories with hierarchies that span multiple dimensions. For example, you can create an organisational chart that can be navigated through separate hierarchies for reporting structure, location, and cost centre.
What’s the difference between a VM and a Container
VMs run on top of a hypervisor, abstracting the hardware。
VMs contain the OS and applications.
The isolation boundary is the VM.
Many VMs can run on the same hypervisor
Containers run on top of the OS but further isolate applications.
With a container engine, like Docker, applications and their dependencies can run isolated from each other on the same OS/VM.
Containers don’t have dedicated memory like a VM does, so you can pack more applications on hardware by using containers rather than just using VMs.
But, containers are not isolated from each other regarding security.
Containers can start very quickly compared to VMs in seconds, sometimes MS.
(Me: things are changing fast in AWS, e.g. Lambda is running inside a VM that is so light it can compete with container in resources they consumes and cold start time and at the same time have the security isolation at a VM level)
What constraint does a NAT device address?
It provides a single IPV4 address for all devices behind it, saving on scarce IPV4 addresses. IPV6 doesn’t have this constraint, thus NAT isn’t as relevant in IPV6.
What are the four invocation methods for a Lambda@Edge function?
- Viewer Request
- Viewer Response
- Origin Request
- Origin Response
What can you do with object locking in S3
Legal holds and retention polices. This prevents objects from deletion.
Versioning must be enabled.
Object locks must be set at time of creating the bucket
me: To use Amazon D3 object lock, follow these basic steps:
1. Create a new bucket with Amazon S3 object lock enabled.
2. (Optional) Configure a default retention period for objects placed in the bucket.
3. Place the objects that you wanted to lock in the bucket.
4. Apply a retention period, a legal hold, or both, to the objects that you want to protect.
(Me: With Amazon S3 object lock, you can store objects using a write-once-read-many (WORM) model. You can use it to prevent an object from being deleted or overwritten for a fixed amount of time or indefinitely. Amazon S3 object lock helps you meet regulatory requirements that required WORM storage, or simply add another layer of protection against object changes and deletion.
-A retention period specifies a fixed period of time during which an object remains locked.
- A legal hold provides the same protection as a retention period, but it has no expiration date. Instead, a legal hold remains in place until you explicitly remove it. Legal holds are independent from retention periods.
An object version can have both a retention period and a legal hold, one but not the other, or neither.
Amazon S3 object lock works only in versioned buckets, and retention periods and legal holds apply to individual object versions. When you lock an object version, Amazon S3 stores the lock information in the metadata for that object version. Placing a retention period or legal hold on an object protects only the version specified in the request. It doesn’t prevent new versions of the object from being created. If you. Put an object into a bucket that has the same key name as an existing, protected object, Amazon S3 creates a new version of that object, stores it in the bucket as requested, and reports the request as completed successfully. The existing, protected version of the object remains locked according to its retention configuration.)
What is CORS?
Cross Origin Resource Sharing is a security measure that a server can use to control what other servers can access resources on it.
What platforms does Elastic Beanstalk support?
- Docker,
- multi-Container Docker,
- Docker Python,
- Docker GlassFish,
- Docker Go,
- tomcat,
- python,
- java,
- node.js,
- .NET,
- Go,
- PHP
- Ruby
What are the various traffic routing types available in R53?
- Simple
- Failover
- Weighted
- Geo location
- Geo-proximity
- Latency
- Multi-value answer.
Me:
geo-proximity - use when you want to route traffic based on the location of your resources and, optionally, shift traffic from resources in one location to resources in another.
multi-value - use when you want Route 53 to respond to DNS queries with up to eight healthy records selected at random.
You want to add more nodes to your Redshift cluster for more storage capacity. You want the fastest downtime for writes as possible. You’d also like to change the node type to a smaller instance. What option should you use?
Create a new instance and switch over then done.å
How can you get more capacity on an EBS volume?
Configure the volume to a larger size.
Me:
You can resize an EBS volumes without downtime.
1. Login to your AWS console
2. Choose “EC2” from the services list
3. Click on “Volumes” under EBS menu
4. Choose the volume that you want to resize, right click on “Modify Volume”
The above steps will increase the “physical’ size of the disk attached to the instance.
You’ll then need to log on to the instance to resize the partition of the disk and resize the file system on that partition.
How to deploy an EMR cluster in a VPC
EMR cluster is deployed in a subnet of your chosen in your VPC to minimise the latency. Security Groups will be added and they are fully managed by Amazon EMR.
When the cluster is launched, Amazon EMR adds security groups based on whether the cluster is launching into VPC private or public subjects.
Security Groups are managed by Amazon EMR.
To manage the cluster on a VPC, Amazon EMR attaches a network device to the master node and manages it through this device. If you modify this device in any way, the cluster will fail.
Create a cluster using the Amazon EMR console
- Open the console
- Choose Create Cluster
- Choose Go to advanced options
- In the Hardware Configuration section, for Network, select the VPC ID
- Select Subnet ID
- Configure NAT instance and S3 endpoint if haven’t already done so.
What options can you use to control access to S3 objects?
Resource policies, identity polices, and Access Control List (ACL)
What are the network modes you can define in a container task?
Network Mode valid values:
- none
- bridge
- awsvpc
- host
Bridge — allow all containers to interact with internal networking to the host.
Host — map containers to host networking; e.g. map container port 80 to host port 80; can only run one container on a host with the same host networking requirement.
AWS VPC — map a VPC ENI to a container task. This is how Fargate works. If using EC2 mode, this can produce a lot of ENIs. Some EC2 instance types have ENI limits, so the ECS may only be able to launch a few containers.
(The above maybe outdated. Check out reinvent 2019 on Fargate: there will only be one ENI in VPC for fargate).
These networking types are for Linux. For Windows, the only networking type is NAT.
awsvpc — network mode give Amazon ECS tasks the same networking properties as Amazon EC2 instances: when you create the awsvpc network mode in your task definitions, every task that is launched from that task definition gets its own elastic network interface (ENI) and a primary private IP address.
The task networking feature simplifies container networking and gives you more control over how containerised applications communicate with each other and others services within your VPC.
What are the seven states of a Step Functions State machine?
- Task
- Wait
- Succeed
- Fail
- Pass
- Parallel
- Choice
How does a Lambda function get permission to access AWS resources?
Assign it an execution role appropriate for what it needs to access.
You want to reduce the number of Classic Load Balancers you’re running. You have several websites. What can you do?
Replace the classic load balancers with an ALB and configure
- the forwarding rules to each of the different websites.
- An SSL cert for each website you support.
How can you improve the chance of keep spot instances you need for a workload?
Provisioning a spot fleet.
How does a document dB achieve relationships as an RDBMS does with separate tables and reference keys?
Documents can contains embedded objects like sub-documents, lists and arrays, similar to how an RDBMS schema would have these things as separate tables with reference keys to join them.
You’re using Kinesis Data Streams to deliver a stream of data to applications. You notice that you’re not getting the capacity you configure across three shards—well less that 3 MB per second. Your producing application is using PutRecord (not the KPL) to write records. What could be the problem?
You may not have three partition keys that you’re rotating evenly while PUTting records. Thus, you’re maximising one shard, but under-using the others.
Me:
Advantage of Using the KPL
KPL has two modes: sync and async, async provides high performance.
KPL implements complex logics so that you don’t have to.
KCL make it easy for consumer-side developers (Java)
- Producer monitoring using CloudWatch. KPL emits throughput, error, and other metrics to CloudWatch and configurable to monitor at the stream, shard, or producer level.
KPL is not the same as AWS SDK. AWS SDK directly work with Kinesis Data Stream APIs
Each shard can support up to 1000 put records per second.
Max size of a data blob (the payload before base64 encoding) within one record is 1 MB.
You are configure a Classic Load Balancer but need to allow a few different HTTP response codes for success. How can you do that?
Use an ALB and enumerate the various HTTP response codes for the health check. CLB doesn’t support Layer 7 where HTTP resides.
(Me: CLB supports HTTP, HTTPs and TCP)
Before you start using Elastic Load Balancing, you must configure one or more listeners for your Classic Load Balancer. A listener is a process that checks for connection requests. It is configured with a protocol and a port for front-end (client to load balancer) connections, and a protocol and a port for back-end (load balancer to back-end instance) connections.
CLB supports
- HTTP
- HTTPS (secure HTTP)
- TCP
- SSL (secure TCP)
What resources can an ALB target for traffic?
Instances, IP addresses or a Lambda function.
Instance or IP addresses can include EC2 instances, ECS or EKS containers.
How do EC2 instances, on-premise VMs and servers become manageable by SSM?
- By installing Systems Manager Agent on these VMs.
- Applying appropriate IAM permissions (EC2 only)
- Activations for On-premise servers.
What network options does ElasticSearch offer?
ElasticSearch clusters run in a dedicated network not part of the customer’s account. For private VPC access, ElasticSearch will expose itself to the customer VPC using interface endpoints. The customer can assign a security group to the cluster. For public access, the cluster is accessible directly for the internet.
Me:
- Cannot be hosted in your VPC (or AWS’ public zone, like S3)
- interface endpoint for private access
- customer can assign security group to the cluster
- internet for public access
-
What are some advance features of Amazon MQ that SNS/SQS does not have?
- Reliable ordered messaging
- Message groups
- Composite messaging (Me: queue and topic? Or SNS + SQS)
And more …
Me:: - Easy to migrate existing messaging service
- Support existing protocols
AWS MQ (Apache activeMQ) it has two main concepts: topic and queues With a queue, you can have multiple consumers of a queue and each message will be delivered once; if there are no consumers when the message arrives, it sits in the queue until a consumer arrives. With a topic you can have multiple consumers and each message will be delivered in December to each consumer, but if a consumer is offline when a message arrives they miss it.
SQS offers serverless queues - you don’t have to pay for the infrastructure, just the messages you send and receive.
SNS is comparable to serverless topics. It will notify your services when a message arrives, but if you’re offline you can miss it. SNS can feed into SQS, so if you have some service that may be up and down, you can guarantee it gets SNS messages by queueing them in SQS for it to consume on its schedule.
What are the three components of Amazon ElasticSearch?
- ElasticSearch (Lucerne)
- Logstash or Beats, and
- Kibana for visualisation
Me:
Logstash: collect, Parse, Transform Logs
Beats: are lightweight data shippers that you install as agents on your servers to send specific types of operational data to ElasticSearch. Logstash has a larger footprint, but provides a broad array of input.
Your customer wants to transfer 120TB of data quickly to AWS s3. What method will you recommend?
Two 80TB snowballs daisy-chained together.
Each Snowball can hold 50 or 80TB. Daisy-chaining avoids having to partition the source data and instead, treat the two snowballs as a single storage unit of 160TB.
50TB — US region only.
Question: how many snowball devices can be daisy-chained together?
Where can you manage access and other configurations such aa retention period and event filters, for CloudWatch Logs?
At the CloudWatch Log Group level, which is a group of related Log Streams.
You want to deliver all CloudTrail events to CloudWatch Logs in a security account to make sure they’re safe and can’t be deleted. How can you do this?
You can have CloudTrail deliver log files from multiple AWS accounts into a single Amazon S3 bucket. For example, you have four AWS accounts IDs 1111111111, 222222222, 33333333333, and 44444444444444, and you want to configure CloudTrail to deliver log files from all four of these accounts to a bucket belonging to account 1111111111111. Steps are:
- Turn on CloudTrail in the account where the destination bucket will belong (1111111111111 in this example). Do not turn on CloudTrail in any other accounts yet.
- Update the bucket policy on your destination bucket to grant cross-account permissions to CloudTrail.
- Turn on CloudTrail in the other accounts you want (222222222222, 333333333333, 444444444444 in this example). Configure CloudTrail in these accounts to use the same bucket belonging to the account that you specified in step 1 (1111111111111111 in this example).
If you have created an organisation in AWS Organisations, you can create a trail that will log all events for all AWS accounts in that organisation. This is sometimes referred to as an organisation trail. You can ask choose to edit an existing trail in the master account and apply it to an organisation, making it an organisation trail. Origination trails log events for the master account and all member accounts in the organisation.
How does a Lambda function get permission to access AWS resources?
assigned it an execution role appropriate for what it needs to access
When querying data using Athena, the select statement needs to refer to a table. What is that table?
It’s one of the schema definitions that describes the columns and data types of the data. You create them using the Athena interface, API or CLI, or the Glue Data Crawler can create them.
What service can you use when your data security requirements need FIPS 142-2?
- AWS KMS service (only AWS API access)
2. CloudHSM service (industry API/protocol such as #PKCS11
What is a CloudWatch Logs Metric Filter?
It allows you to exclude events from a stream based on a text pattern
What is a CMK?
Customer Master Key. this is the core of KMS.
When crypt or decrypt a data key, the execution is inside KMS. Master key is kept in KMS, it cannot be exported.
CMK is used by the envelop encryption method: master key is used to encrypt data keys, data keys are used to encrypt data.
Note, RedShift employs more layered data protection, but the concept is the same.
KMS is regional, it cannot be shared across region. E.g. data encrypted in one region, cannot be decrypted in another region.
S3 bucket object cross region replication deployed complex key management to achieve the safe data transfer.
How does S3 charge for usage?
me: Based on the object storage type, size and outbound data transfer volume (no inbound data transfer fee)
Based on the storage tier
1. Standard Storage (1 month/minimum, no min duration), requests (PUT/GET), and transfer out.
2. IA - same as Standard except 1 month min for 128 MB storage, fee for retrieval.
3. One Zone (Reduced durability): same as IA, but cheaper
4. Intelligent Tiering: for a fee, IT moves your object to the optimum price tier. You still pay the tier.
Lifecycle rules can do this according to your explicit policies. Lifecycle rules can only go in one direction, not back and forth like intelligent Tiering.
What is the difference between updating a stack and using a change set?
The workflow for updating a stack is no different from creating a stack: the user defines the resource updates then executed the new/modified stack template.
Change Sets allow a junior developer to create the template for change, but simply save and notify others of the proposed change, without having permissions to run the Change Set. Senior developer can review the Change Set then run it if approved.
me: Updating a stack does not provide any release management process control, such as review and approve process before the changes going to be made.
What are the two components/polices of an IAM role?
Trust Policy, defining who can assume the role.
Permissions policy, defining what the role can do.
What a CloudWatch Log Stream?
A sequence of Log Events from the same source.
A log event is a line from CloudWatch containing the CloudWatch Log timestamp and the event message, also containing a timestamp of the actual event time.
How can you authenticate a request using ALB?
ALB supports federated identity such as OAUTH, and if configured, it will forward the request to the IDP such as Facebook before letting the request through to the target group.
Me: OpenID Connect compliant IdP. It works with Amazon Cognito.
What can you do with object locking in S3
Legal holds and retention policies. This prevents object from deletion. Versioning must be enabled. Object locks must be set (me: enabled) at time of creating the bucket.
me:
- S3 bucket must enable versioning and locking during creation.
- Object locking applies to a particular object version.
- It is Write Once Read Many times (WORM) - it is immutable once locked.
- An object version can have all the locking permutations (none, retention, retention + Legal hold, Legal hold)
- Locking can be removed manually.
What are some of the constraints of a DX connection?
- Formerly, Public VIFs were limited to a region
- Formerly, Private VIFs are attached to a VPG which is associated with a VPC
- NOW using BGP, the Public VIF advertises all public zone AWS service endpoints in all regions
* Public Zone endpoints do not require or even allow a DX to access the internet - NOW using DX Gateway, customers can use a single Private VIF to a DX GW, then connect to any VPGW in any region; DX GW uses BGP to advertise the networks it can access back to the VIF, reducing admin overhead
* These Private VIFs are not transitive.
Me: when to order a DX, a few things to know (or requirements)
Takes time to establish,
expensive to maintain,
Must chose a speed: < 1GB, 1GB, 10GB
Must establish a single mode finer connection between the data location and DX Partner location.
must have a router at the data location that supports BGP, w/MD5 authentication, VLAN (802.11q), completes LOA/?
Must have a router at DX location
What are the four key terms for ECS?
- Managed Cluster -
- Managed container runtime - e.g. Fargate
- Task definition
- Container definition
What sizing options does Aurora offer?
- Serverless
- Provisioned
- Parallel query
How does ACM supply certificates to an EC2 instance
ACM only supports certain AWS services that explicitly intergrate with it, such as CloudFront, ELB, ElasticBeanstalk (EBS), and API Gateway. R53 also uses ACM for DNS checks during certificate issuing (to ensure that you own the domain).
- CloudFront
- ELB
- EBS
- API Gateway
- Route 53 (for domain ownership validation)
Me: ACM does not supply certificates to an EC2 instance
What are the most common services that can provide data for Redshift
- S3
- Kinesis (Firehose)
- Data Pipeline
Me:
AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services, as well as on-premises data sources, at specified intervals.
With AWS Data Pipeline, you can regularly access your data where it’s stored, transform and prop EDS it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon EMR.
1. S3
2. RDS
3. DynamoDB
4. EMR
How does DocumentDB log it’s activity?
Using logs that it exports to CloudWatch Logs using a server-linked role.