Services Flashcards
IAM
Explain:
- IAM roles
- IAM User
- IAM Credentials Report
- IAM Access Advisor
Identity and access management.
- IAM Role: some services need to perform actions on your behalf. An IAM Role is like a user, but intended to be used by an AWS service. - IAM Credentials Report: lists all users and the state of their credentials - IAM Access Advisor: shows the service level permissions granted to a user and when those services were last accessed
ENI
Explain:
- What is an ENI bound to?
- What attributes can an ENI have?
Elastic network interface.
- Bound to an AZ
- Can have the following attributes: ○ Each ENI can have one private ipv4, one or more ipv6 ○ One elastic IP (ipv4) per private IPv4 ○ One public IPv4 ○ One or more security groups A MAC address
AMI
Explain:
- What is an AMI bound to?
Amazon machine image.
Per region - need to copy it to new region if you want to transfer.
ID is region locked; sharing creates a new ID
EBS
- What is an EBS bound to? How can it be moved?
- How many attachments?
- How is it provisioned?
- How is it deleted?
Elastic Block Store
- Typically one EBS can be attached to one EC2 instance, but there's a multi-attach feature for some EBS (the ssd ones) - Bound to a specific AZ - To move a volume across AZ (or region) you need to snapshot it. It's not necessary to detach to snapshot, but it's recommended. - GB and IOPS must be provisioned in advance - "delete on termination" attribute that deletes a volume when its attached instance is deleted. This is the default for the root volume. Lowest latency compared to other options
EFS
- What protocol?
- How many attachments?
- Advantages?
- Classes
- HA options?
Elastic File System
Can be mounted on many EC2. Multi-AZ.
Highly available, scalable expensive (3x cost of gp2 drive), pay per use
Use case: content management, web serving, data sharing, wordpress
Uses NFSv4.1 protocol
Scale:
- 1000s of concurrent NFS clients, 10 GB+/s throughput
- Grow to petabyte-scale network file system, automatically
Advantages: can mount to many instances, scales cap automatically.
Has the same classes as EC2 (IA, infrequent) with lifecycle management.
HA: Either “classic” or “one zone”. Classic is automatically replicated across multiple AZ so there’s more uptime, but more expensive.
____________
Performance mode (set at EFS creation time)
- General purpose (default): latency-sensitive use cases
- Max I/O: higher latency, thoughput, highly parallel (big data, media processing)
Throughput mode
- Bursting (based on current size)
- Provisioned: set your throughput regardless of storage size
Storage tier (lifecycle management - move after n days)
- Standard: for frequently accessed files
- Infrequent Access: cost to retrieve files, lower price to store
CLB
Classic Load Balancer
- Fixed hostname
- TCP or HTTP health checks
- No websockets, no http/2, no path-based routing, no multiple ports on a single instance, bunch of other small features
Really the only reason to use one are TCP/SSL listeners, support for EC2-classic, and support for sticky sessions using application-generated cookies (in ALB cookies are generated by the load balancer)
ALB
- Fixed IP?
- How does routing work?
- What happens to instances when scaling in?
- What target groups?
Application Load Balancer
- Layer 7 - Load balancing to multiple HTTP applications across machines ("target groups") - Can also load balance to multiple apps on the same machine (like with containers) - Support for HTTP/2, websockets, redirects - Routing tables to diff target groups: ○ Based on path in URL ○ Based on hostname in URL ○ Based on query strings or headers - Great fit for microservices and container-based applications - Port mapping feature to redirect to a dynamic port in ECS (elastic container service) - Classic load balancer sucks for this - you'd need multiple ones per application - Target groups: ○ EC2 instances (can be managed by ASG) ○ ECS tasks (managed by ECS itself) ○ Lambda functions - HTTP request is translated into a JSON event ○ IP addresses (must be private IPs) - One ALB can route to multiple target groups - Health checks are at target group level - You get a fixed hostname (like classic) - Applications don't see the IP of the client directly; it's sent as the X-Forwarded-For header ○ Also x-forwarded-port and x-forwarded-proto (protocol)
NLB
- Fixed IP?
- Advantages?
Network Load Balancer
- Layer 4 (TCP / UDP)
- Extremely high performance
- Less latency than ALB (like 100ms vs 400ms)
- Unlike ALB, has one static IP per AZ and supports assigning elastic IP (helpful for whitelisting specific IP)
You want to use a NLB if you’re dealing with TCP/UDP traffic or you want extreme performance
ACM
AWS Certificate Manager
Can create certs through it or upload your own
RDS
Explain:
- Data retention
- Do you need capacity / instance type on creation?
- Read Replicas
- HA
Relational Database Manager
Automated Backups are automatically enabled in RDS. Daily full backup of the database (during maintenance window); transaction logs are backed up every 5 minutes. Can restore to any point in time (oldest backup to 5 minutes ago). 7 day retentions, up to 35 days.
Automated Snapshots are manually triggered, and can be kept indefinitely.
Storage Autoscaling: don’t need to manually scale database storage; can be done automatically. Just set a maximum storage threshold, and it will upgrade after crossing that point.
Automatically modify storage if:
- Free storage less than 10% of allocated storage
- Low-storage lasts at least 5 minutes
- 6 hours have passed since last modification
Read Replicas
- Up to 5 road replicas
- Within AZ, Cross AZ or Cross Region
- Replication is ASYNC, so reads are eventually consistent
- Replicas can be promoted to their own DB
- Applications must update the connection string to leverage read replicas
- Use cases:
- Classic use case is to create a read replica of a production database for analytics (this way you don’t add additional load to your production application)
- Obviously, read replicas are SELECT statements only
- Again, there is a network cost when data goes from one AZ to another.
Multi AZ
- Synchronous replication. Your RDS is referenced by a DNS name; if there’s a failure, the DNS will point to the replica instead.
- Doesn’t help with scaling, it’s just for HA.
- Read replicas can be set up as multi AZ for disaster recovery.
- Single-AZ to Multi-AZ:
- Zero downtime operation
- Just click on “modify” for the database to set it to Multi-AZ
Aurora
Explain:
- Data retention
- Do you need capacity / instance type on creation?
- Read Replicas
- Failover logic
- Scaling / HA options
- Custom Endpoints
- Serverless
- Machine learning
Proprietary, “AWS cloud optimized” SQL database compatible with Postgres and MySQL drivers.
- Proprietary, "AWS cloud optimized" SQL database compatible with Postgres and MySQL drivers - 3-5x more performant than Postgres or MySQL - Storage automatically grows in increments of 10GB, up to 64TB - Can have 15 replicas (MySQL only has 5) and the replication process is faster (sub 10ms replica lag) - Instantaneous failover. Always HA. - If it's a single instance, it will try to recreate in the same AZ as the original instance. - If it has replicas in a diff zone, aurora changes the CNAME to point to the healthy instance - If serverless (or its AZ) becomes unavailable, it will attempt to recreate in a different AZ. - Costs 20% more than RDs, but is more efficient. - 6 copies of your data across 3 AZ: - 4 copies out of 6 needed for writes - 3 copies out of 6 for reads - Self-healing with peer-to-peer replication - Storage is striped across 100s of volumes - One Aurora Instance takes writes (master) - Automated failover in less than 30s - Master + up to 15 read replicas serve reads - Support for cross region replication - Writer endpoint points to the master - Reader endpoint does load balancing at the connection level. When you access the reader endpoint, you're connected to one of the replicas. - Backtrack: restore data at any point of time without using backups - Encryption methods and rules are exactly the same as with RDS Aurora - Advanced Topics - Auto Scaling: you can enable this to automatically scale up replicas if existing ones have high CPU usage - Custom Endpoints: defined a subset of aurora instances as a custom endpoint. Useful if you have some replicas that are highly performant instances, and you want to run analytical queries against them. - The reader endpoint is generally not used after defining custom endpoints. - Serverless: - automated DB instantiation and auto-scaling based on actual usage - Good for infrequent, intermittent or unpredictable workloads - No capacity planning needed; pay per second, can be more cost-effective - Multi-master: - in case you want immediate failover for write node - Every node does R/W instead of promoting a RR as the new master - Global Database: - Simple approach is to set up cross region read replicas. - Recommended approach is to use Aurora Global Database: § 1 primary region (read/write) § Up to 5 secondary (read-only) regions, replication lag is less than 1 second § Up to 16 read replicas per secondary region § Helps for decreasing latency § Promoting another region (disaster recovery) has a RTO of less than a minute. - Machine Learning: - Enables you to add ML-based predictions to your applications via SQL - Simple, optimized, and secure integration between Aurora and AWS ML services - Supported services: § Amazon SageMaker (use with any ML model) § Amazon Comprehend (for sentiment analysis) - Don't need to have ML experience - Use cases: fraud detection, ads targeting, sentiment analysis, product recommendations
ElastiCache
- How is it provisioned?
- Compare services and their HA and backup options
- How does authentication work?
ElastiCache
- specify EC2 instance type on lauch
- Managed Redis or Memcached
- Can’t be toggled with a button; requires heavy application code changes
- Should be obvious, but works like a typical cache; hits ElastiCache first; if there’s a cache miss, it retrieves that data from RDS and stores it in ElastiCache
- Need to implement your own cache invalidation strategy (e.g. a TTL)
- Common use case is storing session data
- Redis vs Memcached
- Redis: Multi-AZ with auto-failover. Read replicas to scale reads and have high availability. Data durability using AOF (append only file) persistence. Backup and restore.
- Memcached: multi-node for partitioning (sharding), no HA, no persistence, no backup and restore, multi-threaded architecture
- The caches themselves do not support IAM authentication; that’s only for API-level security
- IAM policies on EC are only for AWS API-level security - can still use
- Redis AUTH
- Can set password/token when you create a Redis cluster
- Support SSL in flight encryption
- Memcached
- Supports SASL-based authentication
- Redis AUTH
Route53
- CNAME vs Alias
- 7 routing policies
- Simple:
○ Use when you need to redirect to a single resource
○ You can’t attach health checks to simple routing policy
○ If multiple values are returned, a random one is chosen by the client- Weighted:
○ Control the % of requests that go to a specific endpoint
○ Use case: test 1% of traffic on new app version
○ Helpful to split traffic between two regions
○ Can be associated with health checks - Latency:
○ One of the most useful routing policies
○ Redirect to the server that has the least latency close to us
○ Super helpful when latency of users is a priority
○ Latency is evaluated in terms of user to designated AWS Region
§ Germany may be directed to the US (if that’s the lowest latency) - Failover:
○ Can only have one primary, and one secondary
○ Primary record must be associated with a health check - Geolocation
○ Different from latency based!
○ This is routing based on user location
○ Here we specify: traffic from should go to this specific IP
○ Should create a “default” policy for when there’s no match on location. - Geoproximity
○ Route traffic to your resource based on the geographic location of users and resources
○ Ability to shift more traffic to resources based on the defined bias
○ To change the size of the geographic region, specify bias values:
§ To expand (1 to 99) - more traffic to the resource
§ To shrink (-1 to -99) - less traffic to the resource
○ Resources can be:
§ AWS resources (specify AWS region)
§ Non-AWS resources (specify lat/long)
○ You must use Route 53 Traffic Flow (advanced) to use this feature
○ For exam, just know that it’s useful for shifting traffic from one region to another by changing the bias - Multi-value:
○ Use when routing traffic to multiple resources
○ Want to associate health checks with records
○ Up to 8 healthy records are returned for each multi value query
○ Multivalue is not a substitute for ELB. Honestly, there seems to be very few reasons to use multi-value instead of a load balancer.
- Weighted:
CloudFront
- What is it?
- What origins does it support?
- CF vs S3 CRR?
- Signed URL vs Signed Cookies vs s3 pre-signed url
- What is OAI?
CloudFront
- CDN
- Improves read performance; content is cached at the edge
- 216+ edge locations
- DDOS protection, integration with shield, AWS Web Application Firewall
- Can expose external HTTPS and can talk to internal HTTPS backends
- Origins:
○ S3 bucket
§ Distributing files and caching them at the edge
§ Enhanced security with CloudFront Origin Access Identity (OAI)
□ Can use this to restrict s3 access to CloudFront (e.g. you only want files to be accessed through CloudFront, not s3 directly)
§ Can be used as an ingress to upload files to s3
○ Custom Origin (HTTP)
§ ALB
§ EC2 instance
§ S3 website (must first enable bucket as static website)
§ Any HTTP backend you want
- Geo Restriction: can whitelist or blacklist countries
CloudFront vs S3 Cross Region Replication
CloudFront is a global edge network, where files are cached for a TTL (maybe a day). Ideal for static content that must be available everywhere.
S3 CRR must be set up for every region you want replicated to. Real-time file updating, read only. Great for dynamic content that needs to be available at low-latency in few regions.
Signed URL / Signed Cookies
- Attach a policy with:
○ URL expiration
○ IP ranges to access the data from
○ Trusted signers (which AWS accounts can create signed URLs)
- How long should the URL be valid for?
○ Shared content (movie, music): make it short (a few minutes)
○ Private content (private to the user): you can make it last for years
- Signed URL = access to individual files (one signed URL per file)
- Signed Cookies = access to multiple files (one signed cookie for many files)
It works like so: user does authn/authz with application, application requests a signed url or cookie from aws, application returns url/cookie to user, then user can use that to make requests to aws directly.
CloudFront signed URL vs s3 pre-signed URL
CloudFront URL:
- Allow access to a path, no matter the origin (so it’s useful for any http/https connection, not just s3)
- Account wide key-pair, only root can manage
- Filter by path, IP, date, expiration
- Leverage caching features of CloudFront
S3 pre-signed URL:
- Issue a request as the person who pre-signed the URL
- Because of this, it uses the IAM key of the signing IAM principle
- Limited lifetime
In CloudFront, a signed URL allow access to a path. Therefore, if the user has a valid signature, he can access it, no matter the origin.
In S3, a signed URL issue a request as the signer user. When you sign a request, you need to provide IAM credentials, so accessing a signed URL has the same effect as that user would have done it.
CloudFront (part 2)
- pricing
- multiple origin?
- origin groups
- field level encryption
CloudFront Pricing
- Edge locations all over the world
- The cost of data out per edge location varies
- You can reduce the number of edge locations for cost reduction
- Three price classes:
○ All: all regions, best performance
○ 200: most regions, but excludes the most expensive regions
○ 100: only the least expensive regions
Multiple Origin
- To route to different kinds of origins based on the content type
- Based on path pattern:
○ /images/*
○ /api/*
○ /*
Origin Groups
- To increase HA and do failover
- Origin group: one primary and one secondary origin
- If the primary origin fails, the second one is used
- Example use case: s3 buckets with cross-region replication. If one is down, use the secondary one.
Field Level Encryption
- Protect user sensitive information through application stack
- Adds an additional layer of security along with HTTPS
- Sensitive information encrypted at the edge close to the user
- Uses asymmetric encryption
- Usage:
○ Specify set of fields in POST requests that you want to be encrypted (up to 10 fields)
○ Specify the public key to encrypt them
○ The edge location will encrypt the fields before that data is sent to any other AWS service
Global Accelerator
- what is it?
- how does it work?
- What services does it work with?
- Benefits
- Security
- GA vs CloudFront
- Unicast IP: one server holds one IP address
- Anycast IP: all servers hold the same IP address and the client is routed to the nearest one
- Leverage the AWS internal network to route to your application
- 2 anycast IP are created for your application
- The anycast IP send traffic directly to edge locations
- The edge locations send the traffic to your application
- Works with elastic IP, EC2 instances, ALB, NLB, public or private
- Consistent Performance
○ Intelligent routing to lowest latency and fast regional failover
○ No issue with client cache (because the IP doesn’t change)
○ Internal AWS network - Health checks:
○ Global Accelerator performs a health check of your applications
○ Helps make your application global (failover less than 1 minute for unhealthy)
○ Great for disaster recovery (thanks to the health checks) - Security
○ Only 2 external IP need to be whitelisted
○ Automaticallly get DDOS protection thanks to AWS shield
GA vs CloudFront - Both use the AWS global network and its edge locations around the world
- Both services integrate with AWS shield
- CloudFront
○ Improves performance for both cacheable content (such as images and videos)
○ Dynamic content (such as API acceleration and dynamic site delivery)
○ Content is served at the edge - Global Accelerator
○ Improves performance for a wide range of applications over TCP or UDP
○ Proxying packets at the edge to applications running in one or more AWS regions
○ Good fit for non-HTTP use cases such as gaming (UDP), IoT, (MQTT) or VOIP
Good for HTTP use cases that require static IP addresses
AWS Snow
- three types (and their subtypes), use cases, storage limits
- what is edge computing?
- OpsHub?
AWS Snow
- Highly-secure, portable devices to collect and process data at the edge, and migrate data into and out of AWS
- Data Migration: Snowcone, Snowball edge, snowmobile
- Edge computing: snowcone, snowball edge
- As a rule of thumb, if it takes more than a week to transfer over a network, use Snowball devices!
- Three types:
Snowball Edge
○ Physical data transport solution: move TBs or PBs of data in or out of AWS
○ Alternative to moving data over the network (and paying network fees)
○ Pay per data transfer job
○ Provide block storage and S3-compatible object storage
○ Snowball Edge Storage Optimized
§ 80 GiB of RAM
§ 80 TB of HDD capacity for block volume and s3 compatible object storage
§ Object storage clustering available
○ Snowball Edge Compute Optimized
§ 208 GiB of RAM
§ 42TB of HDD capacity
§ Optional GPU (useful for video processing or ML)
○ Use cases: large data cloud migrations, DC decommission, disaster recovery
Snowcone
○ Small, portable computing, anywhere, rugged and secure, withstands harsh environments (desert, underwater)
○ Light (4.5 lbs)
○ Used for edge computing, storage, and data transfer
○ 8TBs of usable storage
○ Use snowcone where snowball does not fit (space-constrained environment); can even be carried by drone
○ USB-C powered. Must provide own battery/cables.
○ Can be sent back to AWS offline, or connect it to internet and use AWS DataSync to send data
Snowmobile
○ It’s an actual goddamn truck
○ Transfer exabytes of data (1M TB)
○ Each snowmobile has 100 PB of capacity (can use multiple in parallel)
○ High security: temperature controlled, GPS, 24/7 video surveillance
○ Better than snowball if you transfer more than 10 PB
- Edge computing:
- Process data while it’s being created on an edge location (truck on the road, ship on the sea, mining station underground)
- These locations may have limited internet access, limited computing power
- We set up a Snowball edge / snowcone device to do edge computing
- Use cases: preprocess data, machine learning at the edge, transcoding media streams
- Eventually (if need be) we can ship back the device to AWS (for transferring data)
- Both snowball and snowcone can run EC2 instances or AWS lamba functions (using AWS IoT Greengrass)
- Long-term deployment options: 1 and 3 years discounted pricing
- AWS OpsHub
- Historically, to use Snow Family devices, you needed a CLI
- Today, you can use AWS OpsHub (software you install on your computer / laptop) to manage your snow family device
Storage Gateway
- three use cases
- describe the three types
- hardware appliance?
- “Hybrid Cloud”
- Can be due to:
- Long cloud migrations
- Security requirements
- Compliance requirements
- IT strategy
- S3 is a proprietary storage technology (unlike EFS / NFS), so how do you expose the S3 data on-prem? Storage Gateway!
- Use cases: DR, backup and restore, tiered storage
- Three types:
- File Gateway
- Volume Gateway
- Tape Gateway
- File Gateway
- Configured S3 buckets are accessible using the NFS and SMB protocol
- Supports S3 standard, S3 IA, S3 One Zone IA
- Bucket access using IAM roles for each File Gateway
- Most recently used data is cached in the file gateway
- Can be mounted on many servers
- Integrated with Active Directory (AD) for user authentication
- Volume Gateway
- Block storage using iSCSI protocol backed by S3
- Backed by EBS snapshots which can help restore on-premises volumes
- Cached volumes: low latency access to most recent data
- Stored volumes: entire dataset is on premise, scheduled backups to S3
- Useful if you need low-latency access to entire dataset
- Tape Gateway
- Virtual Tape Library (VTL) backed by Amazon S3 and Glacier
- Back up data using existing tape-based processes (and iSCSI interface)
- Works with leading backup software vendors
- Hardware Appliance
- Using storage gateway means you need on-prem virtualization
- Otherwise, you can use a Storage Gateway Hardware Appliance
- Works with FG, VG, TG
- Has the required CPU, memory, network, SSD cache resources
- Helpful for daily NFS backups in small data centers
- For exam:
- On-prem data to the cloud -> think storage gateway
- File access / NFS - user auth with AD -> file gateway (backed by s3)
- Volumes / Block Storage / iSCSI -> volume gateway (backed by s3 with ebs snapshots)
- VTL tape solution / backup with iSCSI -> tape gateway (backed by s3 and glacier)
No on-premises virtualization -> hardware appliance
- Can be due to:
Amazon FSx for Windows
- authn/authz?
- use case?
- HA?
- Backups?
- EFS cannot be used with Windows
- FSx for Windows is a fully managed Windows file system share drive
- Supports SMB protocol and Windows NTFS
- Active Directory integration, ACLs, user quotas
- Built on SSD, scale up to 10s of GB/s, millions of IOPS, 100s PB of data
- Can be accessed from on-prem infra
- Can be AZ
Data backed-up daily to S3
Amazon FSx for Lustre
- use cases
- Completely unrelated, lol
- Lustre = linux + cluster. Used for large-scale computing.
- Machine learning, High Performance Computing
- Seamless integration with S3
- Can “read s3” as a file system (through FSx)
- Can write the output of the computations back to S3 (through FSx)
Can be used from on-prem servers
AWS Transfer
- use cases
- auth?
- Fully managed service for file transfers into and out of S3 or EFS using the FTP protocol
- Supported Protocols:
- AWS Transfer for FTP
- AWS Transfer for FTPS (FTP over SSL)
- AWS Transfer for SFTP (Secure FTP)
- Managed infra, scalable, reliable, HA
- Pay per provisioned endpoint per hour + data transfers in GB
- Store and manage users’ credentials within the service
Integrate with existing authentication systems (Microsoft Active Directory, LDAP, Okta, Amazon Cognito, custom)
- Supported Protocols:
SQS
- What is?
- Retention Period
- Security, access controls
- Message Visibility
- Dead Letter Queue
- Request-Response pattern
- Delay Queue
- FIFO Queue
- Attributes:
○ Unlimited throughput
○ Default retention of messages: 4 days, maximum of 14 days
○ Low latency (<10ms on publish and receive)
○ Limitation of 256kb per message sent- “At least once delivery”: can occasionally have duplicate messages
- “Best effort ordering”: messages can be out of order sometimes
- Producing Messages:
○ Produced to SQS using the SDK (SendMessage API)
○ The message is persisted in SQS until a consumer deletes it
○ Example: send an order to be processed - Consuming Messages:
○ Consumers (running on EC2 instances, servers, AWS lambda)
○ Consumer polls SQS for messages (receive up to 10 messages at a time)
○ Process the messages (example: insert the message into an RDS database)
○ Delete the messages using the DeleteMessage API
○ Great way to scale up message processing as needed is by using an ASG that manages EC2 instances which poll for messages
§ Need some sort of message to determing when to scale out/in - CloudWatch has a metric called ApproximateNumberOfMessages (queue length). Set up a CloudWatch alarm to scale out/in. - Security:
○ In-flight encryption using HTTPS API
○ At-rest encryption using KMS keys
○ Client-side encryption if the client wants to perform encryption/decryption itself - Access controls:
○ IAM policies to regulate access to the SQS API
SQS Access policies (similar to S3 bucket policies). Useful for cross-account access, or allowing other services to write to an SQS queue
SQS - Message Visibility Timeout
- After a message is polled by a consumer, it becomes invisible to other consumers- - By default, the "message visibility timeout" is 30 seconds - After that timeout is over, the message is "returned" an can be picked up by other consumers (so it could be processed twice) - If the consumer is still working on it but needs more time there is a ChangeMessageVisibility API it can hit for more time - Tradeoff of having a high visibility timeout is that if a consumer crashes, it can take a long time for it to be picked up again. If it's too low, you can get duplicate processing.
SQS - Dead Letter Queue
- We can set a threshold of how many times a message can go back into the queue (like if something about it is causing consumers to repeatedly fail) - After the MaximumReceives threshold is exceeded, the message goes into a dead letter queue (DLQ). Useful for debugging a problem. - Make sure to process the messages in the DLQ before they expire. Good to set a retention of 14 days in the DLQ. - A DLQ is literally just an SQS queue. Personally, it seems like a good idea to set up CloudWatch metrics for that.
SQS - Request-Response Systems
- Create bidirectional flow between producers and responders. Include a “reply to” field in the request a producer sends. When the responder finishes processing that request, it sends a message to that “reply to” field (another SQS queue). The idea is that we can now scale out requesters as needed, not just scale out responders.
○ This is literally the same thing as backpressure in Elixir GenStage
- Need to know that you should use the SQS Temporary Queue Client to implement this pattern.
○ It leverages virtual queues instead of creating / deleting SQS queues (more cost-effective).
SQS - Delay Queue
- Delay a message (consumers don't see it immediately) up to 15 minutes - Default is 0 seconds - Can set a default at queue level - Can override the default on send using the DelaySeconds parameter
SQS - FIFO Queue
- Limited throughput: 300 msg/s without batching, 3000 msg/s with - Exactly-once send capability (by removing duplicates)
Messages are processed in order by the consumer
SNS
- What can subscribe to SNS?
- What can publish to SNS?
- Limits?
- Security / access controls?
- FIFO
- Message filtering
- It’s just pub/sub
- The “event producer” only sends message to one SNS topic
- Can have as many “event receivers” (subscriptions) as we want to listen to the SNS topic notifications
- Each subscriber to the topic will get all the messages (note: new feature to filter messages)
- Up to 10M subscriptions per topic
- 100k topics limit
- Subscribers can be:
○ SQS
○ HTTP / HTTPS (with delivery retries)
○ Lamba
○ Emails
○ SMS Messages
○ Mobile Notifications - SNS integrates with a lot of AWS services because many AWS services can send data directly to SNS for notifications
○ CloudWatch (for alarms)
○ ASG notifications
○ S3 (on bucket events)
○ CloudFormation (upon state changes => failed to build, etc) - Topic Publish (using the SDK)
○ Create a topic
○ Create a subscription (or many)
○ Publish the topic - Direct Publish (for mobile apps SDK)
○ Create a platform application
○ Create a platform endpoint
○ Publish to the platform endpoint
○ Works with Google GCM, Apple APNS, Amazon ADM - Security:
○ Same as SQS - Access Controls:
Same as SQS; IAM policies to regulate access to the API, and SNS Access Policies
Fan Out Pattern:
- Application: S3 events to multiple queues
○ For the same combination of event type (e.g. object create) and prefix (e.g. images/) you can only have one S3 event rule, so you need a fan-out pattern to send an event to multiple queues
- SNS can also have FIFO topics:
○ This is useful if you can’t have duplication, or if ordering is important
○ Can only have SQS FIFO queues as subscribers
- Message filtering:
- JSON policy used to filter messages sent to SNS topic’s subscriptions
- If a sub doesn’t have a filter policy, it receives every message
Kinesis Data Streams
- retention period
- shards? hot / cold shards?
- producers and consumers?
- talk about data ordering for kinesis vs SQS FIFO
capture, process and store data streams
- A stream is made of shards. The more shards you have, the more throughput - Like before, you have producers and consumers - A record consists of a partition key and a data blob - Billing is per shard provisioned, can have as many shards as you want - Retention is 1 (default) to 365 days - Ability to reprocess (replay) data - Typically get 2MB per shard, but can pay extra ("enhanced fan-out") for 2MB per shard per consumer - Once data is inserted into kinesis, it can't be deleted (immutability) - Data that shares the same partition goes to the same shard (ordering) - Producers: SDK, Kinesis Producer Library (KPL), Kinesis Agent - Consumers: - Write your own: Kinesis Client Library (KCL), AWS SDK Managed: Lambda, Firehose, Kinesis Data Analytics
________________________________
- For SQS standard, there is no ordering - For SQS FIFO, if you don't use a Group ID, messages are consumed in the order they are sent, with only one consumer - Say you want to scale the number of consumers, but you want messages to be "grouped" when they are related to each other. You can use a Group ID (similar to a partition key in kinesis). The more Group ID we have, the more consumers we can have. - Let's assume 100 trucks, 5 kinesis shards, 1 SQS FIFO: - Kinesis Data streams - On average you'll have 20 trucks per shard (because hashing). Trucks will have their data ordered within each shard. Maximum amount of consumers in parallel we can have is 5 (because we only have 5 shards). Because it's 1MB/s per shard, you get 5MB/s. SQS FIFO - You will only have one FIFO queue. You will have 100 group ID. You can up to 100 Consumers (due to the 100 Group ID). Maximum throughput is 300 messages per second (or 3000 if using batching).
Kinesis Data Firehose
- use cases?
- where is the data stored?
- is it real-time?
- failure / backup stuff?
- retention period
- compare streams vs firehose
load data streams into AWS data stores
- Fully managed service, no administration, automatic scaling, serverless - AWS: S3, Redshift, ElasticSearch - 3rd party providers (data dog, new relics, splunk, mongoDB) - Custom http destination - Pay for data going through firehose - Near realtime; 60 seconds latency minimum for non-full batches, or minimum 32MB data at a time - Supports many data formats, conversions, transformations, compression - Supports custom data transformations using AWS lambda - Can send failed or all data to a backup S3 bucket
Streams vs Firehose:
- Streams are a streaming service for ingest at scale. Write custom code, real-time, and manage scaling yourself. Data storage for 1-365 days, and supports replay. - Firehose is for loading streaming data into destinations. Fully managed, near real-time, auto-scaling, no data storage, no replay.
Kinesis Data Analytics
- what is
- use cases
analyze data streams with SQL or Apache Flink
- Real-time analytics on kinesis streams using SQL - Fully managed, no servers to provision, automatic scaling - Can create streams out of the real-time queries - Use cases: - Time-series analytics - Real-time dashboards - Real-time metrics
Kinesis Video Streams
capture, process and store video streams
Amazon MQ
- Managed Apache ActiveMQ
- Runs on dedicated machine, doesn’t scale as well as SQS/SNS, can run in HA with failover
- For exam, if you need to migrate existing infra that uses MQTT/AMQP (or others), then use Amazon MQ
Containers on AWS
- list the three options
- ECS (amazon’s container platform)
- Fargate (amazon’s serverless container platform)
- EKS (managed kubernetes)
ECS
- What is it? Describe some of its features.
- How does IAM work for ECS tasks?
- Data Volumes?
- How does load balancing work?
- How does scaling work?
- How do updates work?
- Launch Docker containers on AWS
- You must provision and maintain the infrastructure (EC2 instances)
- AWS takes care of starting/stopping containers for you
Has integrations with the Application Load Balancer
IAM Roles for ECS tasks:
- ECS Instance profile:
○ Used by the ECS agent
○ Makes API calls to ECS service
○ Send container logs to CloudWatch logs
○ Pull docker image from ECR
○ Reference sensitive data in Secrets Manager or SSM Parameter Store
- ECS Task Role:
○ Allow each task to have a specific role
○ Use different roles for the different ECS Services you run
Task role is defined in the task definition (task roles are common exam question)
ECS Data Volumes - EFS File Systems
- Works for both EC2 tasks and fargate tasks - Ability to mount EFS volumes onto tasks - Tasks launched in any AZ will be able to share the same data in the EFS volume - Fargate + EFS = serverless + data storage without managing servers
ECS Services + Tasks
- Load balancing for EC2 launch type:
○ The ALB supports finding the right port on your EC2 instances (so you don’t need to specify it)
○ You must allow on the EC2 instance’s security group any port from the ALB security group
- Load balancing for Fargate:
○ Each task has a unique IP
○ You must allow on the ENI’s security group the task port from the ALB security group
- ECS tasks can be invoked by Event Bridge
ECS Scaling
- Just use CloudWatch alarms like usual - CloudWatch metric (ECS Service CPU Usage) - Optionally, scale ECS Capacity Providers (adds more EC2 instances if not using fargate) - Could also scale on something like SQS Queue length
ECS Rolling Updates
- When updating from v1 to v2, we can control how many tasks can be started and stopped, and in which order
- This is based on min and max %; the min is how many must remain running, and the max is how many over the current number can run while moving over.
○ So if that’s 50/100, it will remove half your v1 and and swap with v2, then swap the remainder to v2.
If that’s 100/150, it will keep the existing v1 but add 50% more v2, before swapping the remainder over and returning to 100% total.