Decoupling applications: SQS, SNS, Kinesis, Active MQ Flashcards
Two patterns of application communication
Synchronous (app to app)
Asynchronous / Event based (app to queue to app)
Synchronous problems
Problematic if sudden spikes of traffic appear
How can you decouple your apps? (to avoid sudden spike of traffic issues)
SQS - Queuing model
SNS - pub/sub model
Kinesis - real time streaming model
What is the benefit of decoupling?
Services (sqs, sns, kinesis) decouple independently from the app
SQS architecture
Producer sends messages to SQS queue, then the Consumer polls the messages to process them. Once processes they get deleted
What is the purpose of an SQS queue?
Acts as a buffer to decouple between producers (sender) and consumers (processor)
Standard Queue
Fully managed
Unlimited throughput and messages in queue
Retention = 4-14 days
Low latency
256KB per message
Can have duplicates
Can have out of order mesasges
What do producers do
Send to SQS using SDK (SendMessage API)
message persists 4-14 days until consumer deletes it
How do consumers work
Consumers run on EC2 instances, servers or AWS Lambda
Polls / Receive SQS messages (up to 10 at a time)
Process message (e.g. insert into RDS DB)
Delete message using DeleteMessage API
Multiple EC2 instances consumers
Can have many consumers in SQS Queue
receive and process messages in parallel
At least once delivery (because they can work on one message at the same time)
Best-effort message ordering
Message deleted after processed
Can scale consumers horizontally to improve throughput of processing
SQS with ASG
SQS polls messages to EC2 instances (consumers) which are in an ASG.
CloudWatch Metric - Queue Length (ApproximateNumberOfMessages) will set an alarm and send it to CloudWatch.
If alarm is triggered, CloudWatch notifies ASG
How to decouple using SQS?
Instead of doing requests & processing on front end web app you can:
Receive the requests in the front-end web app (in an ASG) and send them to SQS.
SQS will send them to the back-end processing application (video processing) which is in an ASG. The processing is done here.
Then get sent to the S3 bucket once done.
SQS Security
Encryption in flight using HTTPS API, at rest encryption using KMS keys.
Client-side encryption if the client wants to handle encryption/decryption
Access Controls - IAM policies to regulate access to SQS API
SQS Access Policies (similar to S3 bucket policies)
-useful for cross-account access to SQS queues
- allowing other services (SNS, S3) to write to an SQS queue
Message visibility timeout
When message gets polled by consumer it becomes invisible to other consumers (30 seconds)
After that it will be visible in the queue and another consumer can work on it
ChangeMessageVisibility API
When a consumer knows that they need more than 30 seconds (default) to process a message, they can request this API and extend the invisibility window.
This will help with not processing the same message twice
Long Polling
When a consumer requests messages from queue but they want to “wait” for messages to arrive if there are non in the queue.
Can reduce latency
Can reduce API calls
Long Polling how it works
Consumer polls for up to 20 sec, waiting and when a message is received in the SQS queue, then it polls it and processes it.
Downsides of FIFO
Limited throughput 300msg/s without batching and 3000 with batching.
Without FIFO there is unlimited throughput
Benefits of FIFO
exactly once send capability (removing duplicates)
Messages processed in order by consumer
How to use SQS as a Buffer to database writes
Requests go in your EC2 within an ASG. They get enqueued, meaning they get sent to SQS Queue (infinitely scalable) and then they get sent from the SQS Queue to another ASG which holds Dequeueing EC2 instances. These requests get sent as messages.
From that Dequeueing SQS the messages get insert into the Databases.
This makes sure nothing is lost. Once they are in the DB, then they can be deleted from SQS.
Amazon SNS
Send one message to multiple receivers
How to use SNS with Pub / Sub?
Buying service -> SNS topic which has multiple subscribers (services) such as email notifications, fraud service, shipping service, SQS queue and they all receive the messages
SNS how it works
even producer only sends to ONE SNS topic
can have as many event receivers as we want
Up to 12,500,000 subs per topic, 100,000 topics limit
Where can SNS public to?
SQS, Lambda, Kinesis Data Firehose, Emails, SMS & Mob notifications, HTTP(S) endpoints
How does SNS public?
Topic Publish (using SDK)
Create topic, create subs, publish to topic
OR
Direct Publish (mobile apps SDK)
create platform app
create platform endpoint
publish platform endpoint
works with google GCM, Apple APNS, Amazon ADM
SNS Security
Inflight encryption using HTTPS API
at rest encryption using KMS
client-side encryption
Access controls - IAM policies to regulate access to SNS API
SNS Access policies
useful for cross-acc access to SNS topics
useful for allowing other services (S3) to write to an SNS topic
SNS & SQS Fan Out
Buying service sends message to SNS Topic. From there it gets sent to 2 or more SQS Queues. Each queue sends the request to the services associated with them (fraud, shipping service etc)
SNS & SQS Fan Out benefits
Push one in SNS receive in all SQS subs
Fully decoupled, no data loss
SQS allows for data persistence, delayed processing and retries of work
ability to add more SQS subs over time
SQS access policy should allow for SNS to write
Cross-region delivery
How to send S3 events to multiple queues?
Usually you can only have 1 event and prefix (1 S3 event rule)
To send same S3 event to many SQS queues you need to fan out.
S3 object created, send to S3, then to SNS topic, fans out to SQS Queues, lambda function etc
SNS to Amazon S3 through Kinesis Data Firehose
Buying service -> SNS topic -> Kinesis Data Firehose -> S3 and any Supported KDF Destination
SNS Message Filtering
JSON policy to filter messages sent to SNS topics subs
If a sub doesnt have a filter policy, it receives every message
How does SNS Message Filtering work
Buying service sends new transaction with data (order num, product, qty, state) to SNS Topic.
From there it gets sent to SQS Queues. First SQS Queue has placed orders policy filter and only receives placed orders. The next one has cancelled orders, other one declined orders, and another SQS Queue has all.
Kinesis Overview
Collect, process, analyse streaming data in real-time.
What does Kinesis ingest?
Real time data such as
App logs
Metrics
Website clickstreams
IoT telemetry data
4 types of Kinesis
Data Streams - capture, process, store data streams
Data Firehose - load data streams into AWS data stores
Data Analytics - analyse data streams with SQL or Apache Flink
Video Streams - capture, process, store, video streams
What do Kinesis Data Streams include?
They are composed by Shards (1,2,3, etc) and you need to provision them beforehand.
They define the stream capacity in terms of ingestion and consumption rights.
What do producers do for Kinesis Data Streams?
They produce records to Kinesis Data Streams.
They could be
EC2
Client
SDK, KPL
Kinesis Agent
What do records sent by Producers to Kinesis Data Streams include?
2 parts
Partition Key - decides which Shard to go to
Data Blob (up to 1 mb) or value
Can send 1 MB/sec or 1000 msg/sec per shard
What are the consumers for Kinesis Data Streams?
Apps (KCL, SDK)
Lambda
Kinesis Data Firehose
Kinesis Data Analytics
What does the record that goes into Consumers include?
Partition Key
Sequence no.
Data Blob
2 MB/sec (shared) per shard all consumers OR 2 MB/sec (enhanced) per shard per consumer)
Kinesis Data Streams Architecture
Producer (apps, clients etc) -> record (partition key, blob) -> kinesis data stream (shards) -> record (partition key, sequence number, blob) -> consumers (apps, lambda, firehose, kinesis data analytics)
Properties of Kinesis Data Streams
1-365 days retention period
ability to reprocess (replay) data
cant be deleted if added to kinesis (immutability)
data sharing partition go to same shard (ordering)
producers
-AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent
consumers
- write your own: Kinesis Client Library (KCL), AWS SDK
- managed: AWS Lambda, kinesis Data Firehose, Kinesis Data Analytics
Capacity Modes for Kinesis Data Streams
Provisioned Mode
- choose n of shards provisioned, scale manually or using API
- each shard gets 1MB/s in (or 1000 records per sec)
- Each shard gets 2MB/s out (classic or enhanced fan-out consumer)
- pay per shard provisioned per hour
On Demand Mode
- no need to provision or manage capacity
- default capacity provisioned (4MB/s in or 4000 records per sec)
- automatic scaling based on observed throughput peak during last 30 days
- pay per stream per hour & data in/out per GB
Kinesis Data Streams Security
Deployed within a Region
- Control access / auth using IAM policies
- HTTPS endpoints encryption in flight
- at rest using KMS
- implement encryption/decryption of data on client side (harder)
- VPC endpoints available for Kinesis to access within VPC
- Monitor API calls using CloudTrail
Kinesis Data Firehose
Takes data from Producers (Apps, Clients, Kinesis Data Streams, CloudWatch, AWS IoT)
And write data into destinations (batch writes)
Transformation of data can happen via Lambda functions
Kinesis Data Firehose destinations
AWS Destinations (s3, RedShift, Opensearch)
3rd-party Partner Destinations (splunk, mongoDB)
Custom Destinations (HTTP endpoints)
Kinesis Data Firehose characteristics
Fully managed
redshift, S3, opensearch
3rd party partners
custom - http
pay for data going through firehose
Near real time
- 60 sec latency
- min 1 MB of data at time
supports many data formats, conversations, transformations, compression
supports custom data transformations using AWS lambda
can send failed or all data to a backup S3 bucket
Kinesis Data Streams vs Firehose
Kinesis Data Streams
- streaming service for ingest at scale
- write custom code (producer / consumer)
- real time (200ms)
- manage scaling (shard splitting / merging)
- data storage 1 to 365 days
- supports replay capability
Kinesis Data Firehose
- load streaming data to S3, Redshift, Opensearch, 3rd Party, Custom HTTP
- fully managed
- near real-time (buffer time min. 60 sec)
- automatic scaling
- no data storage
- doesnt support replay capability
Data ordering in Kinesis
If you have 3 shards and 3 objects. If one objects partition key says “object_id” and gets sent to shard one.
Then the second object id is for shard 2. Every time they send data, the data will be sent to the shard they were assigned to
SQS Data Ordering
Since there is no data ordering in SQS unless you use FIFO.
You can use Group ID (like partition key in Kinesis) and then you can have different Groups.
This allows to have more consumers Otherwise you can only have 1 consumer
SQS vs SNS vs Kinesis
SQS
- consumer pulls data
- data deleted after consumed
- can have many consumers
- no need to provision
- order guarantee only with FIFO
- individual message delay capability
SNS
- push data to subscribers
- up to 12,500,000 subs
- data not persistent (lost if not delivered)
- pub / sub
- up to 100,000 topics
- no need to provision
- integrates with SQS for fan-out
- FIFO capability for SQS FIFO
Kinesis
- Pull data (2 MB per shard)
- Push data enhanced fan-out (2 MB per shard per consumer)
- replay data possibility
- for real-time big data, analytics
- ordering at shard level
- data expiration after x days
- provisioned more or on-demand capacity mode
Amazon MQ
Managed Message Broker service for
RabbitMQ
ActiveMQ
Amazon MQ downsides
does not scale as much as SQS/SNS
How to have HA with Amazon MQ?
runs on servers so you can have Multi-AZ with failover
HA architecture for Amazon MW
1 Region, 1 Active AZ, 1 Standby AZ, 1 Amazon EFS
If client is talking to Amazon MQ broker, and it crashes, then the Client will start talking to the one in the Standby AZ, and nothing will be lost because both AZ were connected to EFS.
Does SQS scale automatically?
Yes
What should you do in Kinesis when you have 6 shards but traffic spikes and you get a ProvisionedThroughputExceeded warning?
Add more shards
Is Amazon Kinesis Data Streams a supported subscriber for AWS SNS?
No