03 - Application Integration Flashcards
Amazon SQS – Standard Queue
• Fully managed service, used to decouple applications
- Attributes:
- Unlimited throughput, unlimited number of messages in queue
- Default retention of messages: 4 days, maximum of 14 days
- Low latency (<10 ms on publish and receive)
- Limitation of 256KB per message sent
- Can have duplicate messages (at least once delivery, occasionally)
- Can have out of order messages (best effort ordering)
SQS – Message Visibility Timeout
• After a message is polled by a consumer, it becomes invisible to other consumers
- By default, the “message visibility timeout” is 30 seconds
- That means the message has 30 seconds to be processed
- After the message visibility timeout is over, the message is “visible” in SQS
- If a message is not processed within the visibility timeout, it will be processed twice
- A consumer could call the ChangeMessageVisibility API to get more time
- If visibility timeout is high (hours), and consumer crashes, re-processing will take time
- If visibility timeout is too low (seconds), we may get duplicates
Amazon SQS – Dead Letter Queue
- If a consumer fails to process a message within the Visibility Timeout … the message goes back to the queue!
- We can set a threshold of how many times a message can go back to the queue
- After the MaximumReceives threshold is exceeded, the message goes into a dead letter queue (DLQ)
- Useful for debugging!
- Make sure to process the messages in the DLQ before they expire:
- Good to set a retention of 14 days in the DLQ
Amazon SQS – Delay Queue
- Delay a message (consumers don’t see it immediately) up to 15 minutes
- Default is 0 seconds (message is available right away)
- Can set a default at queue level
- Can override the default on send using the DelaySeconds parameter
Amazon SQS - Long Polling
• When a consumer requests messages from the queue, it can optionally “wait” for messages to arrive if there are none in the queue
- LongPolling decreases the number of API calls made to SQS while increasing the efficiency and latency of your application.
- The wait time can be between 1 sec to 20 sec (20 sec preferable)
- Long Polling is preferable to Short Polling
- Long polling can be enabled at the queue level or at the API level using WaitTimeSeconds
SQS – Request-Response Systems
- To implement this pattern: use the SQS Temporary Queue Client
- It leverages virtual queues instead of creating / deleting SQS queues (cost effective)
Amazon SNS – How to publish
Topic Publish (using the SDK)
• Create a topic
• Create a subscription (or many)
• Publish to the topic
Direct Publish (for mobile apps SDK) • Create a platform application • Create a platform endpoint • Publish to the platform endpoint • Works with Google GCM, Apple APNS, Amazon ADM…
SNS + SQS: Fan Out
- Push once in SNS, receive in all SQS queues that are subscribers
- Fully decoupled, no data loss
- SQS allows for: data persistence, delayed processing and retries of work
- Ability to add more SQS subscribers over time
- Make sure your SQS queue access policy allows for SNS to write
SNS – Message Filtering
- JSON policy used to filter messages sent to SNS topic’s subscriptions
- If a subscription doesn’t have a filter policy, it receives every message
Kinesis Overview
• Makes it easy to collect, process, and analyze streaming data in real-time
- Ingest real-time data such as: Application logs, Metrics, Website clickstreams, IoT telemetry data…
- Kinesis Data Streams: capture, process, and store data streams
- Kinesis Data Firehose: load data streams into AWS data stores
- Kinesis Data Analytics: analyze data streams with SQL or Apache Flink
- Kinesis Video Streams: capture, process, and store video streams
Kinesis Data Streams
- Billing is per shard provisioned, can have as many shards as you want
- Retention between 1 day (default) to 365 days
- Ability to reprocess (replay) data
- Once data is inserted in Kinesis, it can’t be deleted (immutability)
- Data that shares the same partition goes to the same shard (ordering)
- Producers: AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent
- Consumers:
- Write your own: Kinesis Client Library (KCL), AWS SDK
- Managed: AWS Lambda, Kinesis Data Firehose, Kinesis Data Analytics,
Kinesis Data Firehose
- Fully Managed Service, no administration, automatic scaling, serverless
- AWS: Redshift / Amazon S3 / ElasticSearch
- 3rd party partner: Splunk / MongoDB / DataDog / NewRelic / …
- Custom: send to any HTTP endpoint
- Pay for data going through Firehose
- Near Real Time
- 60 seconds latency minimum for non full batches
- Or minimum 32 MB of data at a time
- Supports many data formats, conversions, transformations, compression
- Supports custom data transformations using AWS Lambda
- Can send failed or all data to a backup S3 bucket
Kinesis Data Streams vs Firehose
Kinesis Data Streams
• Streaming service for ingest at scale
• Write custom code (producer / consumer)
• Real-time (~200 ms)
• Manage scaling (shard splitting / merging)
• Data storage for 1 to 365 days
• Supports replay capability
Kinesis Data Firehose
• Load streaming data into S3 / Redshift / ES / 3rd party / custom HTTP
• Fully managed
• Near real-time (buffer time min. 60 sec)
• Automatic scaling
• No data storage
• Doesn’t support replay capability
Kinesis Data Analytics (SQL application)
- Perform real-time analytics on Kinesis Streams using SQL
- Fully managed, no servers to provision
- Automatic scaling
- Real-time analytics
- Pay for actual consumption rate
- Can create streams out of the real-time queries
- Use cases:
- Time-series analytics
- Real-time dashboards
- Real-time metrics
SQS vs SNS vs Kinesis
SQS
• Consumer “pull data”
• Data is deleted after being consumed
• Can have as many workers (consumers) as we want
• No need to provision throughput
• Ordering guarantees only on FIFO queues
• Individual message delay capability
SNS • Push data to many subscribers • Up to 12,500,000 subscribers • Data is not persisted (lost if not delivered) • Pub/Sub • Up to 100,000 topics • No need to provision throughput • Integrates with SQS for fanout architecture pattern • FIFO capability for SQS FIFO
Kinesis • Standard: pull data • 2 MB per shard • Enhanced-fan out: push data • 2 MB per shard per consumer • Possibility to replay data • Meant for real-time big data, analytics and ETL • Ordering at the shard level • Data expires after X days • Must provision throughput