Cheat Sheet (short v1) Flashcards

Question

Network Load Balancer (NLB)

Answer 1

make routing decisions at the transport layer aka Layer 4 (TCP/SSL). They can handle millions of requests per second with extremely low latency. They don't support path-based routing or host-based routing the way ALB does.

Answer 2

operates using TCP, SSL, HTTP and HTTPS. Not as good at high throughput / low latency as NLB. Also unlike NLB, it does not support load balancing to multiple ports on an instance.

Answer 3

* You can use Direct Connect (DX) to connect an on-prem data centre to one or multiple VPCs * DX can take > 1 month to setup * For resilience, add a 2nd DX connection. As this can take time to setup and is costly, in the short term consider also adding an IPSec VPN connection (with the same BGP prefix) for resiliency. * A hosted virtual interface (hosted VIF) allows another AWS account to access your DX * Use AWS DataSync to copy large amount of data from on-prem to S3, EFS, FSx, NFS shares, SMB shares, AWS Snowcone (via Direct Connect). For copying data, use DMS to copy databases. You must create one of the following virtual interfaces to begin using DX: * Private virtual interface (private VIF): access a VPC using private IP addresses * Public virtual interface (public VIF): access all AWS public services using public IP addresses * Transit virtual interface (transit VIF): access one or more VPC Transit Gateways associated with DX gateways, within a Region.

Answer 4

Central Hub connecting on-prem networks and VPCs. * Reduces operational complexity as you can easily add more VPCs, VPN capacity, Direct Connect gateways, without complex routing tables. * Provides additional features over-and-above VPC peering * A transit virtual interface is used to access VPC Transit Gateways * Pattern for connecting 1 DX to multiple VPCs in the same Region is to associate the DX with a transit gateway * on-prem -> DX -> DX location -> transit virtual interface -> transit gateway association -> Transit Gateway -> multiple VPCs

Answer 5

* VPN connections go over the internet * AWS Managed site-to-site VPN Connection is connected between a Customer Gateway on the customer side and Virtual Private Gateway (VPG, or VPN gateway) that you create at the edge of your VPC.

Answer 6

* provision infrastructure using a text-based template that describes exactly what resources are provisioned and their settings. Can use scripts to automate the creation of member accounts and VPCs. * manages the template history similar to how code is managed in source control * AWS SAM (Serverless Application Model) is an extension of CloudFormation for packaging, testing and deploying serverless applications 2 methods of updating a stack * direct update - CloudFormation immediately deploys your changes * change sets - preview your changes first, then decide if you want to deploy

Answer 7

For disaster recovery in a different region, create a AMI from your EC2 instance and copy it into a 2nd region. DR approaches * Backup and restore = lowest cost, just create backups * Pilot Light = small part of core services that is running and syncing data or documents * Warm Standby = scaled down version of a fully functional environment that is actively running * Multi-site = on-prem and in AWS in an active-active configuration

Answer 8

* Block-level storage (with EBS disk that is physically attached to the host computer) * Very high performance and low latency * Can be cost effective since the cost is included in the instance cost * You can hibernate the instance to keep what's in memory and in the EBS, but if you stop or terminate the instance then you lose everything in memory and in the EBS storage. Temporary/ephemeral, ideal for * temp info that changes frequently such as caches, buffers, scratch data, * data that is replicated across a fleet of instances where you can afford to lose a copy once in a while and the data is still replicated across other instances

Answer 9

for low-latency interactive apps, dev&test environments. Can have bursts of CPU performance but not sustained.

Answer 10

* for sub-millisecond latency, sustained IOPS performance. * Be sure to distinguish: IOPS solves I/O aka disk wait time, not CPU performance * IOPS is related to volume size, specifically per GB. * These are more $

Answer 11

* EBS Cold HDD (sc1) lowest cost option for infrequently accessed data and use cases like sequential data access * EBS Throughput Optimized HDD (st1) which is for frequent access and throughput intensive workloads such as MapReduce, Kafka, log processing, data warehouse and ETL workloads. Higher $ than sc1. * however note that the HDD volumes have no IOPS SLA.

Answer 12

* EBS can't attach to multiple AZs (there is a new EBS multi-attach feature but it's only single AZ, and only certain SSD volumes such as iop1, iop2). EBS is considered a "single point of failure". * To implement a shared storage layer of files, you could replace multiple EBS with a single EFS * Not fully managed, doesn't auto-scale (as opposed to EFS) * Use EBS Data Lifecycle Manager (DLM) to manage backup snapshots. Backup snapshots are incremental, but the deletion process is design so that you only need to retain the most recent snapshot. * iSCSI is block protocol, whereas NFS is a file protocol * EBS supports encryption of data at rest and encryption of data in transit between the instance and the EBS volume.

Answer 13

* can attach to many instances across multiple AZ, whereas EBS cannot (there is a new EBS multi-attach feature but it's only single AZ, and only certain SSD volumes such as iop1, iop2) * fully managed, auto-scales (whereas EBS is not) * Linux only, not Windows! * Since it is Linux, use POSIX permissions to restrict access to files * After a period up to 90 days, you can transition unused data to EFS IA * Protected by EFS Security Groups to control network traffic and act as firewall

Answer 14

* durable (99.999999999%) * a best practice is to enable versioning and MFA Delete on S3 buckets * objects have to be in S3 for > 30 days before lifecycle policy can take effect and move to a different storage class. * Intelligent Tiering automatically moves data to the most cost-effective storage * Standard-IA is multi-AZ whereas One Zone-IA is not * A pre-signed URL gives you access to the object identified in the URL (URL is made up of bucket name, object key, HTTP method, expiration timestamp). If you want to provide an outside partner with an object in S3, providing a pre-signed URL is a more secure (and easier) option than creating an AWS account for them and providing the login, which is more work to then manage and error-prone if you didn't lock down the account properly. * You can't send long-term storage data directly to Glacier, it has to pass through an S3 first * Accessed via API, if you want to access S3 directly it can require modifying the app to use the API which is extra effort * Can host a static website but not over HTTPS. For HTTPS use CloudFront+S3 instead. * Best practice: use IAM policies to grant users fine-grained control to your S3 buckets rather than using bucket ACLs * Can use multi-part upload to speed up uploads of large files to S3 S3 lifecycle 2 types of actions: * transition actions (define when to transition to another storage class) * expiration actions (objects expire, then S3 deletes them on your behalf)

Answer 15

slow to retrieve, but you can use Expedited Retrieval to bring it down to just 1-5min.

Answer 16

* to replace Microsoft Windows file server * can be multi-AZ * supports DFS (distributed file system) protocol * integrates with AD * FSx for Lustre is for high-performance computing (HPC) - does not support Windows

Answer 17

* for globally distributed applications. 1 DB can span multiple regions * If too much read traffic is clogging up write requests, create an Aurora replica and direct read traffic to the replica. The replica serves as both standby instance and target for read traffic. * "Amazon Aurora Serverless" is different from "Amazon Aurora" - it automatically scales capacity and is ideal for infrequently used applications.

Answer 18

* Transactional DB (OLTP) * If too much read traffic is clogging up write requests, create an RDS read replica and direct read traffic to the replica. The read replica is updated asynchronously. Multi-AZ creates a read replica in another AZ and synchronously replicates to it * RDS is a managed database, not a data store. Careful in some questions if they ask about migrating a data store to AWS, RDS would not be suitable. * To encrypt an existing RDS database, take a snapshot, encrypt a copy of the snapshot, then restore the snapshot to the RDS instance. Since there may have been data changed during the snapshot/encrypt/load operation, use the AWS DMS (Database Migration Service) to sync the data. * RDS can be restored to a backup taken as recent as 5min ago using point-in-time restore (PITR). When you restore, a new instance is created from the DB snapshot and you need to point to the new instance.

Answer 19

* Database cache. Put in front of DBs such as RDS or Redshift, or in front of certain types of DB data in S3, to improve performance * As a cache, it is an in-memory key/value store database (more OLAP than OLTP) * Use case: accelerate autocomplete in a web page form Redis vs. Memcached * Redis has replication and high availability, whereas Memcached does not. Memcached allows multi-core multi-thread however. * Redis can be token-protected (i.e. require a password). Use the AUTH command when you create the Redis instance, and in all subsequent commands. * For Redis, ElastiCache in-transit encryption is an optional feature to increase security of data in transit as it is being replicated (with performance trade-off)

Answer 20

* Use when the question talks about key/value storage, near-real time performance, millisecond responsiveness, and very high requests per second * Not compatible with relational data such as what would be stored in a MySQL or RDS DB * No concept of read replica like in RDS and Aurora. For read-heavy or bursty workloads, use DAX, an in-memory cache, to accelerate performance. * DynamoDB measures RCUs (read capacity units, basically reads per second) and WCUs (write capacity units) * DynamoDB auto scaling uses the AWS Application Auto Scaling service to dynamically adjust throughput capacity based on traffic. Best practices: * keep item sizes small (<400kb) otherwise store in S3 and use pointers from DynamoDB * store more frequently and less frequently accessed data in different tables * if storing data that will be accessed by timestamp, use separate tables for days, weeks, months

Answer 21

* Replace on-prem without changing workflow * Types: File Gateway (for NFS and SMB), Volume Gateway, Tape Gateway. * Stores data in S3 (e.g. for file gateway type, it stores files as objects in S3) * Provides a cache that can be accessed at low latency, whereas EFS and EBS do not have a cache

Answer 22

* Use AWS Schema Conversion Tool (SCT) to convert a DB schema from one type of DB to another, e.g. from Oracle to Redshift * Use Database Migration Service (DMS) to copy database. Sometimes you do SCT convert, then DMS copy. * Use AWS DataSync to copy large amount of data from on-prem to S3, EFS, FSx, NFS shares, SMB shares, AWS Snowcone (via Direct Connect). For copying data, not databases.

Answer 23

* Redshift is a columnar data warehouse that you can use for complex querying across petabytes of structured data. It's not serverless, it uses EC2 instances that must be running. Use Amazon RedShift Spectrum to query data from S3 using a RedShift cluster for massive parallelism * Athena is a serverless (aka inexpensive) solution to do SQL queries on S3 data and write results back. Works natively with client-side and server-side encryption. Not the same as QuickSight which is just a BI dashboard. * Amazon S3 Select - analyze and process large amounts of data faster with SQL, without moving it to a data warehouse

Answer 24

* ideal for solutions that must be durable and loosely coupled * pull-based (use SNS for pushing messages, especially broadcasting to multiple services) * Standard vs. FIFO: FIFO is very rigorous whereas Standard is best-effort. The trade-off is that Standard has unlimited throughput of transactions per sec. * batching adds efficiency * SQS doesn't prioritize items in the queue. If you need to prioritize use multiple queues, one for each priority type * Max message size is 256kb (otherwise use S3 to log events), and max retention time of 14 days * When a reader picks a message from the queue, the message stays in the queue but is invisible until the job is processed. If the visibility timeout occurs (job is not processed in time), then the message reappears in the queue for another reader to take. * To use industry standards with Apache ActiveMQ, use an Amazon MQ instead of SQS (this is similar to using EKS instead of ECS, the industry-standard version of containers rather than the Amazon proprietary version) Short polling vs. Long polling = time to wait before polling again * Short polling is the default. When you poll the SQS, it doesn't wait for messages to be available in the queue to respond. It checks a subset of servers for messages and may respond that nothing is available yet. * Long polling waits for a message to be in the queue before responding, so it uses fewer total requests and reduces cost.

Answer 25

fully managed messaging service for pushing async notifications, especially used for broadcasting to multiple services

Answer 26

* for use cases that require ingestion of real-time data (e.g. IoT senor data) * Kinesis data stream is made up of shards, which are made up of data records, which each have a sequence #. Then you map devices to partition keys which group data by shard.

Answer 27

* Throttling limits: you can configure a server-side throttling limit, a per-method throttling limit, a per-client throttling limit, and an account-level throttling limit. * API Caching for a STAGE by specifying a TTL = time-to-live (by default 300 seconds).

Answer 28

* CloudFront distributes files from an origin. The origin can be an S3 bucket, EC2 instance, ELB, Route 53, or external. * Lambda@Edge is a feature of CloudFront that lets you run code closer to users of your application, which improves performance and reduces latency * Field-level encryption is a feature that applies extra encryption at edge locations to ensure sensitive data provided by the user (e.g. PII) is secured end-to-end * Can be configured to load an error page ("content not found") for operationally simple error handling * Not just for static content, CloudFront is used for streaming content too * Geo restriction (whitelist/blacklist access to content by country, e.g. due to copyright restrictions) * Set the price class to US, Canada, Europe, etc. to determine where the content will be cached * To only allow specific IP addresses to access content, CloudFront can use signed URLs or signed cookies which include an expiration timestamp, and the range of IP addresses of users who can access the content. CloudFront+S3 * S3 can host a static website but not over HTTPS. For HTTPS use CloudFront+S3 instead. * To prevent users accessing S3 content directly, create an origin access identity (OAI) which is a special CloudFront user and change S3 bucket permissions so that only the OAI can access. This is specific to CloudFront+S3.

Answer 29

* increases availability and performance * can be expensive * runs over AWS global network * directs traffic to optimal endpoints across multiple regions * By default, provides you with 2 static IP addresses that are anycast from the AWS edge network. You can migrate existing IPv4 (/24) IPs rather than creating new.

Answer 30

* request temporary limited-privilege credentials for IAM users, or for users that you authenticate such as federated users from an on-prem directory * Federation: STS can be used Federation (typically with Azure AD). It uses SAML 2.0 for authentication to grant temporary access based on the AD creds Single Sign-On: STS can be used to develop a custom identity broker for SSO to a service such as the AWS management console: * Verify that the user is authenticated on the local IDP (AD) * Call STS AssumeRole or GetFederationToken API to get temp credentials * Pass the temp creds to AWS federation endpoint to request a sign-in token * Construct a URL to the service that includes the token which can be provided to the user

Answer 31

* Container management service for Docker containers * Highly scalable / high performance, lets you run applications on an EC2 cluster * ECS uses the ECS Service Auto Scaling (aka Application Auto Scaling) service to scale tasks using a scaling policy that you configure. * ECS is about tasks. You pay for the running time of tasks. For example, you can't add container instances to an IAM group, you associate tasks with IAM roles/groups. ECS Launch Types * Fargate Launch Type is serverless, managed by AWS * EC2 Launch Type gives you direct access to the instances, but you have to manage them

Answer 32

Best practice is to lock away or delete the root user access keys. Never store in an S3 bucket, even if encrypted. Data at rest * Client-side encryption can be done by 1) using a customer master key (CMK) stored in KMS, or 2) using a master key that you store in your application. You can't use S3 managed keys client-side * Server-side encryption can be done in several ways * SSE-C: use customer-provided keys and manage them yourself (on-prem) * SSE-S3: Amazon manages the keys * SSE-KMS: keys are managed in Amazon Key Management Service * CloudHSM: generate and use your own encryption keys, held in the cloud in Amazon's HSM Data in motion * SSL/TLS is for encrypting data in transit, not data at rest. * SSL/TLS is synonymous with HTTPS traffic. It goes over port 443.

Answer 33

use with CloudWatch+SNS to trigger notifications to services

Answer 34

The permissions boundary for an IAM entity (user or role) sets the max permissions that the entity can have

Answer 35

* To apply security restrictions across multiple AWS accounts, use Service Control Policy (SCP). For just a single account, use IAM policies. * You can migrate an account to another AWS organization, e.g. if you divest a business unit *

Answer 36

* Audit trail of API calls * Logs Data Events (resource operations) aka Data Plane Operations * Logs Management Events (management operations on resources) aka Control Plane Operations * Use other tools such as VPC Flow Logs to capture network packets *

Answer 37

= for application networking for microservices applications

Answer 38

= share a Transit Gateway connection (only?) with other AWS accounts

Answer 39

is for migrating virtual machines

Answer 40

coordinate multiple AWS services into serverless workflows so you can build and update apps quickly. Includes long-running executions not supported within Lambda execution limits.

Answer 41

is a PaaS service for describing and provisioning resources. Can be used to quickly deploy and manage applications in AWS. Developers upload applications and Beanstalk handles the deployment details. Note that it's not serverless, it relies on EC2 instances.

Answer 42

is for executing tasks. Helps developers build, run, and scale background jobs

Answer 43

quickly develop, build and deploy applications on AWS

Cheat Sheet (short v1) Flashcards

(71 cards)