Messaging Flashcards
Why are synchronous mode of communication between two applications problematic ?
Synchronous mode of communication between two applications could be sometimes problematic if there is a sudden spike in traffic. For instance, how to suddenly encode thousand videos, when usually it’s 10.
So its better to decouple the applications so that they can independently scale.
What are three asynchronous models of communication currently supported by AWS ?
Three asynchronous models of communication currently supported by AWS are
1. SQS - Queue model
2. SNS - Pub Sub model
3. Kinesis - Real time streaming model
What are SQS queues and what are some of it’s attributes ?
SQS are standard queues offered by AWS. Some of it’s attributes are
1. Unlimited throughput, unlimited number of messages in queue
2. Default retention period of 4 days, maximum 14 days
3. Latency < 10ms for publish and receive
4. Limitation of 256kb of messages sent
5. Can have duplicate messages (at least one delivery, happens occasionally)
6. Can have out of order messages (best effort ordering)
How do we produce and consume messages using SQS queue ? which APIs are needed ?
Multiple producers send messages into SQS queue using SendMessage API. Multiple consumers will poll for messages from the queue using ReceiveMessage API call. Consumers will then process the messages and delete it from the queue using DeleteMessage API. This ensures no other consumer will see these messages.
How can we scale consumers for a SQS queue ?
We can use Auto-Scaling group to scale the consumers horizontally to increase throughput. We can setup a CloudWatch alarm that is triggered when some conditions are met such as queue length increases above a threshold.
Why do we need SQS Queue Access policy. Provide a use case ?
We use SQS Queue access policy to provide cross account access. For example, suppose we have SQS queue running in one account and we need to consume messages from an EC2 instance running in another account.
What is SQS Message visibility timeout ?
When a message is polled by consumer, that message will not be visible by any other consumer for a specified amount of time. By default the time period is 30 seconds which can be increased. During this time the consumer must delete this message from the queue using DeleteMessage API or else it will be visible again to other consumers.
What are SQS Deal Letter Queues and why it’s used ?
We can configure the SQS queue to send the messages to a dead letter queue in order to debug any issues such as a message being failed to be processed by any consumer continuously and a threshold is breached.
What are SQS DelayQueues ?
We can configure SQS queue to delay the visibility of a message to the consumers in order to delay the polling of that message. The max duration we can delay is 15 minutes by overriding the DelaySeconds parameter.
What is Long Polling ?
When a consumer requests messages from a queue it has the option to “Wait” for messages to arrive, if they are unavailable in the queue at the moment. Long Polling is preferred over Short Polling because it increases the efficiency and reduces calls to SQS. Wait time is between 1 Sec and 20 Secs (preferably 20 secs)
Long Polling can be enabled at Queue level or API level using ReceiveMessageWaitTimeSeconds
What is SQS Extended Client ?
It is a Java library to send the messages that are larger than 256 KB using a S3 Bucket to store the message.
What are some of the key APIs used in SQS Queue
- CreateQueue
- DeleteQueue
- PurgeQueue
- SendMessage
- ReceiveMessage
- ReceiveMessageWaitTimeSeconds
- DeleteMessage
- ChangeMessageVisibility
What is SQS FIFO Deduplication ?
If a duplicate message arrives on the SQS Queue within a particular interval of time (5 mins) it will be discarded by the queue. The SQS calculates it either by calculating the hash of the message or through a DeDuplication Id specified through the message.
How can we ensure that all the messages are consumed by the same consumer and all messages are in Order ?
We can achieve this by specifying the same value in MessageGroupId. Messages with the same Group Id are in order.
What is Amazon SNS and how does it work ?
Amazon SNS is used to implement the pub-sub model in AWS. The event producer sends messages to only one topic. There could be up to 125,00,000 subscribers to that topic. In one account we can have up to 1,00,000 topics
What is Fan Out Pattern and why it’s used ?
Fan Out pattern is when we push messages to one SNS topic and then multiple SQS queues can subscribe to it. This is to ensure no data loss in case there is an application failure.
What are Kinesis Data Streams ?
Kinesis Data Streams is a way to stream big data into our systems. It is made up of multiple shards and needs to be provisioned ahead of time.The data is split across all the shards.
How can we ensure ordering of the records in Kinesis Data Streams ?
Each record is assigned a partition key and all records with the same partition key goes to the same shard
What are attributes of Kinesis Data Streams ?
- The records have a retention period from 1 to 365 days
- Ability to reprocess (replay) data
- Once records are inserted in Kinesis, it cant be deleted (records are immutable)
What are Producers and Consumers of Kinesis Data Streams ?
Producers - AWS SDK, Kinesis Producer Library, Kinesis Agent
Consumers - AWS Lambda, Kinesis Data Firehose, Kinesis Data Analytics
What are two Capacity Modes of Kinesis Data Streams ?
Provisioned Mode
1. You choose the number of shards provisioned
2. Each shard gets 1 MB/s in
3. Each shard gets 2 MB/s out
4. Pay per provisioned shard per hour
On-Demand Mode
1. Default capacity provisioned 4 MB/s in or 4000 records per second
2. Scales automatically based on observed throughput peak during the last 30 days
3. Pay per stream per hour and data in/out per GB
How do we maintain security for Kinesis Data Streams ?
- Control access/authorization using IAM policies
- Encryption in flight using HTTPS endpoints
- Encryption at rest using KMS
- Can implement encryption/decryption of data at client side as well
What is Amazon Data Firehose ? What are few attributes of Firehose ?
Amazon Data Firehose is a fully managed service that take data from Producers and then Batch writes to multiple destinations.
What are Producers and Consumers for Amazon Data Firehose ?
Producers
1. Applications, Desktop/Mobile Apps, SDK, Kinesis Agents
2. Kinesis Data Streams
3. Amazon CloudWatch Logs and Events
4. AWS IOT
Consumers
1. 3rd Party Partner Destinations like Splunk, mongoDB etc
2. AWS Destinations like S3, Redshift, Opensearch etc
3. Http Endpoints
What are the main differences between SQS, SNS & Kinesis ?
SQS
1. Consumers pull data and deletes data after consumption.
2. There could be any number of consumers
3. Ordering guarantees only on FIFO queue
4. Possible to delay delivery of messages
SNS
1. Publishers publish data to a topic and many consumers (upto 125,00,000) are subscribed to that topic
2. Pub/Sub model with 1,00,000 topics
3. Ordering by pairing with FIFO queue through Fan-out pattern
Kinesis
1. Meant for real time big data, analytics and ETL
2. Pull data at the rate of 2MB per shard. Push data at the rate of 2MB per shard per consumer
3. Possibility to replay data