SNS, SQS, Kinesis and Integration patterns Flashcards by Scott Stevens

What services can consume from an SQS Queue (3)? When they consume a queue, how many messages can be pulled at any one time

Lambda, EC2, On premise machines can consume SQS and can pull up to 10 messages at a time.

How well did you know this?

Not at all

Perfectly

In SNS, how many topics can a producer send a message to? How many subscribers can a topic have?

A producer can send a message to only one topic in SNS, and that topic can have 10,000,000 subscribers

How well did you know this?

Not at all

Perfectly

What is the maximum data buffer size in Kinesis Firehose? (hint: its a magic number) What is the maximum time that data can be buffered?

128 MB and 900sec (15 min)

How well did you know this?

Not at all

Perfectly

For Kinesis, how many KCL instances can you have per shard? What order are records read in and where can KCL be deployed?

One KCL instance per shard. Records are read in order at the shard level. KCL can be deployed on EC2 instances, on prem or as Elastic Beanstalk.

How well did you know this?

Not at all

Perfectly

We have a queue with a visibility timeout set to 10 seconds. What will happen to a message on that queue if the consumer takes more than 10 seconds to successfully process it? Can you do anything to prevent this behaviour without increasing the visibility timeout?

If the message has not been processed within the 10 second window, it will become visible again and another consumer can pick it up and process it. This means that a message can be processed more than once. There is a ChangeMessageVisibility API call which will allow your consumer to change the visibility timeout if processing takes longer than allocated.

How well did you know this?

Not at all

Perfectly

What is the purpose of an SQS access policy?

These are simmilar to an S3 bucket access policy. An SQS access policy allows for cross account access to SQS as well as defining what other AWS services can write to SQS

How well did you know this?

Not at all

Perfectly

If I have a kinesis stream set up with one shard and one producer - how many consumers can I have?

You can have as many consumers as possible BUT they must consume all less than 2MB/5TPS of data from the shard as this is the shard limit.

How well did you know this?

Not at all

Perfectly

If I needed to load data into Redshift, S3, ElasticSearch or Splunk from Kinesis analytics, what would I use? Is it real time, if not what is the latency?

Kinesis Firehose - Firehose is near realtime as there is a 60 second latency.

How well did you know this?

Not at all

Perfectly

For SNS, what mechanisms are used for encryption inflight, and at rest (3)? What mechanism is used to regulate access control to SNS? What would allow cross account access to SNS or to define which AWS services can write to an SNS topic (hint policies)?

Inflight via HTTPS API
At rest using KMS keys
Or you can encrypt client side

Access control is regulated by IAM policies

SNS access policies define which services can write to SNS and allow for cross account access.

How well did you know this?

Not at all

Perfectly

How is Kinesis Firehose billed? Do you need to provision capacity?

Billing is based on the amount of data going through Firehose (and for data conversion). You don’t need to provision capacity as Firehose scales automatically.

How well did you know this?

Not at all

Perfectly

I have a situation where I need to trigger jobs based on an S3 object create event type where the object is prefixed with images/. These events will need to be sent to multiple destinations consisting of SQS and Lamda. What pattern would you use and WHY is it appropriate in this instance?

Object create event types can write to only one SQS queue meaning we can’t send an event notification to multiple destinations. In this case we would use SNS fanout and send the notification to SNS where it can be consumed by multiple SQS queues and lamda functions.

How well did you know this?

Not at all

Perfectly

I have a situation where I have multiple messages coming into a FIFO queue for different customers. I need to ensure that each set of messages for each customer is processed IN ORDER, for instance customer A has 4 messages which must be processed in order, and customer B has 2 messages which must be processed in order. How can I do this in SQS and will the messages for customer A be processed before customer B and is there a difference in ordering if I have one SQS consumer versus multiple ?

SQS FIFO has a MessageGroupID setting. We can specify different values per customer (i.e GroupA, B) which will ensure that messages within that group are processed in order.
Ordering across groups is not guaranteed so customer B might start processing before customer A.

If you only have one consumer then the messages are processed in standard FIFO order. If you have multiple consumers then individual groups are assigned to each.

How well did you know this?

Not at all

Perfectly

How many AZ’s is Kinesis replicated to?

How well did you know this?

Not at all

Perfectly

We have a system using an SQS queue to integrate with several EC2 consumer instances. We notice though that several messages fail processing and constantly get sent back to the queue where they are picked up by other consumers fail again in which case the loop starts again. What would you do to ensure that a message is only sent back to the queue 3 times?

You would setup a dead letter queue and and define a MaximumRecieves threshold of 3. After 3 attempts, the message will be sent to the dead letter queue.

How well did you know this?

Not at all

Perfectly

In terms of kinesis, What is the write and read rate per shard and how are records ordered? Do you need to provision capacity or does Kinesis scale automatically?
How long is data retained in a shard (default and max in days)?

Writes=1MB/Sec or 1000 messages.
Reads=2MB/Sec
Records are ordered within a shard and you need to provision capacity.
Data is retained for 1 day be default, 7 days max.

How well did you know this?

Not at all

Perfectly

I have an application set up with a front end web tier and some back end applications responsible for generating shipping orders. These are decoupled using SQS. When there is a surge in load, I notice more messages in the queue waiting to be processed. Currently my back end only has 2 EC2 instances. What can I do to elastically scale when I have surges in orders being received?

You would ideally use an autoscaling group for the backend. Scale events can be triggered by using cloudwatch to monitor the number of messages in the queue, and trigger a scale out when it breaches the threshold using the ApproximateNumberOfMessages API call.

How well did you know this?

Not at all

Perfectly

How many shards would I need to support 5MB/sec writes and 6MB Sec reads?

5 and 3 respectively - 1MB/Sec write, 2MB/Sec reads

How well did you know this?

Not at all

Perfectly

If I needed to transform click stream data before delivering it to my consuming application, would I use Kinesis Streams or Kinesis Firehose? What underlying technology supports this?

Kinesis Firehose allows you to transform data in the stream. It uses Lambda synchronous invocations to operated against a buffered batch of stream data (up to 3MB)

How well did you know this?

Not at all

Perfectly

What are the impacts if the visibility timeout is set at a high value (hours) or a very low value (a few seconds)?

If the timeout is set very high, then messages will take a very long time to reappear - for instance if the consumer crashes the other consumers won’t pick up the message for the period of the timeout. If the value is set to low, then we will likely get a lot of duplicates being processed.

How well did you know this?

Not at all

Perfectly

What are the min, max and default values for the VISIBILITY timeout in SQS?

0 sec min
12 hrs max
30 sec default

How well did you know this?

Not at all

Perfectly

Assume we have a courier business with 100 drivers nation wide. Each van has its own unique ID and uses this to stream in gps data. For an SQS FIFO FiFo queue how many group ID’s could you create and how many consumers could you have?

We would have 1 fifo queue with 100 group ID’s and up to 100 consumers

How well did you know this?

Not at all

Perfectly

For Kinesis, what is used to control access and authorization? How is encryption in transit and at rest handled?

Access and authorization is handled by IAM. Encryption at rest is handled by KMS and encryption in flight is via HTTPS.

How well did you know this?

Not at all

Perfectly

I have an order system set up which needs to send an order to multiple destinations - such as order processing, invoicing and the customer loyalty system. More applications might need to be added over time, or some might need to be decommissioned. When designing the app, we need to make sure that it is decoupled from the recieving system, so SQS seems like a good option. Rather than having my application implement multiple connections to different queues, what would the optimal design pattern be? What access policy do I need for my SQS queues and what SQS queue type should I use?

Study These Flashcards

This is a case of where SNS fanout is ideal. We would have our application publish the order to an SNS topic and then have however many SQS queues we need as subscribers to that topic. This means we are fully decoupled and we can add or remove queues (and applications) as needed. We will need to ensure that the access policy for the SQS queue allows writes from SNS.
We can only use SQS standard in a fanout as FIFO is not supported.

What data encoding is data returned in from Kinesis?

Study These Flashcards

Base64

What 3 design patterns can be used to decouple an application and which AWS services do they use?

Queue: SQS Pub/Sub: SNS Streaming: Kinesis

You have a kinesis data stream with several shards. You are storing user data from your website and have used country geo-location data as your partition key. One of the shards is frequently throwing a provisioned throughput exceeded exception. Why? Whats the solution?

We need to look at the partition key. In this instance using country as a key may not be the best idea as we could have much more data coming from one country than from another. We would need to look at making the key more granular, implementing exponential backoff and possibly increasing the number of shards

If you needed the ability to replay a days worth of data through your application would you use SQS, SNS or Kinesis?

Kinesis allows you to re-run data.

Is Kinesis Firehose real time or near real time? Why?

Firehose is near real time (1-15 minutes latency). The reason is that FH will buffer data before writing. The more data that is buffered, the higher the latency.

I have an application with a high arrival rate of messages inbound to an SQS queue. This works fine, but I would like to try and save some costs by minimising the amount of calls to the SQS API. What can I do to reduce the number of calls to the API, and which API calls could I apply the solution to?

You can batch messages together for the SendMessage, DeleteMessage and ChangeMessageVisbility API's which will reduce the number of API calls made to the SQS API

We have VPC with a private subnet with several EC2 instances which need access to Kinesis. We could route traffic to Kinesis out via a NAT to an internet gateway - but we don't want data going out over the public web. What is the solution?

Use a Kinesis VPC endpoint which will keep traffic within the private AWS network

What would you recommend as an highly available and fault-tolerant solution to capture the click-stream events from the source and then provide a simultaneous feed of the data stream to the consumer applications? Kinesis Streams or Fire hose and why?

The key to the answer is that we are providing a feed from the stream to consuming applications. As we are doing this, Firehose is the correct answer. We can't use streams as enables real-time processing of streaming big data - but Firehose is the easiest way to load streaming data into data stores and analytics tools

What is the default and minimum and maximum retention period for a message in SQS? What is the maximum message size

4 Days default. 1 minute minimum 14 Max. Max. Message size is 256Kb

You have a kinesis stream which usually receives 4MB/Sec in and emits 6MB/Sec out. There is an increase in data requiring an inbound rate of 7MB/Sec. What do you do? You currently have 4 shards

A shard can ingest 1MB/Sec. We have 4 shards handling our inbound so we will need to provision another 3 shards bringing the total up to 7.

Lets say we have an architecture with 10 front end instances and 10 backend instances with integration between them handled by an SQS queue. Both the front end and back end are in autoscaling groups with the backend triggered to scale based on queue length in SQS. If the backend scales to 15 instances, does the front end need to scale to also?

No. The architecture is decoupled via SQS so there is no dependency between the frontend and backend that require both to scale at the same end - they can scale independently.

In Kinesis streams there are THREE things we need to specify for a PutRecord call. What are they (hint: think in terms of how a stream works)

1. The stream name 2. The partition Key 3. The data blob that you want to put into kinesis

What are the six enpoints supported by SNS? (hint - 4 of these cover the same 2 general types)

``` HTTP HTTPS Email Email-JSON SQS Lambda ```

Does changing the visibility timeout in SQS mitigate the delivery of a message multiple times?

No.

For an SQS Standard do we need to provision throughput and is throughput limited? Are there any limits on the number of consumers for the queue and is there any data persistence in either queue type?

No to all. SQS is not intended for data persistence. It will RETAIN messages for 4 to 14 days, but they will be removed after that period expires. The expectation is that the consuming application will delete the message from the queue.

What tool would you use to perform real time data analytics on a Kinesis stream and what language does this tool use? How is the tool billed?

Kinesis data analytics uses SQL to perform real time analytics on Kinesis streams. Billing is based on consumption rate.

What is the difference between the DelaySeconds and the VisibilityTimeOut parameters? Why would you use them?

Delay seconds delays the visibility of the message when it ARRIVES on the Queue. Visibility time out hides the message on the Queue at the time of CONSUMPTION. Delay seconds creates a delay queue which is useful if we have a dependency on something with eventual consistency. Visibility timeout prevents multiple consumers from processing the same message.

How is In flight and At rest encryption provided in SQS (3)?

Inflight via HTTPS, at rest by KMS. You could also client side encrypt you data before sending it.

In Kinesis Streams can the same message be processed by more than one consumer?

Yes. This is because records are not deleted after they have been consumed unlike SQS.

Should the retention period for a DLQ be the same as your other SQS queues? Why, Why not?

Your DLQ threshold should be set to a higher value than your other SQS queues - generally the maximum value of 14 days so they can be reprocessed.

Assume we have a courier business with 100 drivers nation wide. Each van has its own unique ID and uses this to stream in gps data. For Kinesis streams, assume you have 5 shards. How many consumers can we have and on average how many van id's would you have per shard?

On average we would have 20 vans/shard with five consumers

What is the key difference between SQS and SQS FIFO? What impacts does this have on throughput? How must FIFO queues be named?

SQS supports best effort ordering and messages may be delivered more than once. 10k/sec throughput SQS FIFO is strictly ordered and each message will be delivered only once.Up to 3,000 messages per second with batching, or up to 300 messages per second without batching. FIFO queues must be named .fifo

I have an application that allows a user to upload an image and generate a thumbnail from it and build up an album. The service is aimed at professional photographers and images are usually 50MB+ in size. The application needs to be decoupled and SQS is the obvious choice, however, I am limited to the 256Kb message size. What can I do to leverage SQS while still allowing for potentially very large message sizes?

The SQS Extended client is intended to allow processing of large SQS payloads. For extended client, the SQS message contains metadata which points to the large payload on S3. The consumer then uses this metadata to consume the data from S3. This has the advantage of being able to use full SQS functionality but also to process very large payloads.

Can PutRecords send records to multiple different partition keys in Kinesis?

Yes, A PutRecords request can include records with different partition keys. The scope of the request is a stream; each request may include any combination of partition keys and records up to the request limits

There are two ways to PUBLISH using SNS each has a different SDK. What are their names and what do they apply to? ___Publish

Topic Publish using the AWS SDK | Direct Publish using the mobile apps SDK

I have a system that requires SQS messages to be delivered only once and in strict order. Which queue type should I use and how does that queue type ensure that a message isn't placed on the queue more than once (2 methods) and over what time frame do these checks occur?

FIFO will ensure strict ordering and one time delivery. FIFO De-duplication is based on either a SHA-256 hash of the message body, or an explicitly set message de-duplication ID and these are checked over a 5 minute period meaning if the same message arrives in the 5 minute window, it will be rejected.

How does the Kinesis Client Library manage work between consumers?

The KCL uses dynamoDB to checkpoint progress of workers against Kinesis. It then uses this data to track the workers and share work amongst the kinesis shards.

For Kinesis Streams - what does the Partition Key in a putRecord request do?

The partition key defines the shard where the record is to be written

Why would you use a SQS FiFo queue with grouping over a Kinesis stream, and vice-a-versa

You would use a FiFo queue if you wanted to have a dynamic number of consumers as Kinesis streams has one consumer per shard. You would use streams if you had large amounts of data that needed to be ordered per shard and you had a large number of partition keys (1000's)

You have built a decoupled system using SQS. However, it looks like the latency between a message arriving and it being processed is higher than we would like and that CPU utilisation on our application server is consistently high. On further investigation, you find your consumer application is calling SQS way to often and getting no data. What would you do and why?

Long polling would be ideal. Long polling will allow the consumer to wait for a message to arrive on the SQS queue and process it as soon as it arrives, saving API calls and decreasing latency (as the message is processed as soon as it arrives). Polling can configured between 1 and 20 seconds with 20 being the default.

In Kinesis, what is the difference between the PutRecord and PutRecords API call? Why would you use one over the other? Which is preferred?

PutRecord writes a single record to kinesis. PutRecords allows you to send multiple records to Kinesis Data Streams in a single request. By using PutRecords, producers can achieve higher throughput when sending data to their Kinesis data stream. The use of PutRecords is preferred over a single PutRecord unless your application specifically needs to always send single records per request, or some other reason PutRecords can't be used.

In Kinesis, how many shards can data associated with a partition key be written to? What are the impacts of this?

One. Each partition key gets written to one shard only. If your key is not distributed enough we will get a hot partition meaning that more data is getting written to that partition

I have a visibility timeout set to 30 seconds in SQS. My consumer picks up the message for processing. During the 30 second window, is the message visible to any other consumers? If the message is not processed in the 30 second window of the visibility timeout, what happens?

The message won't be visible during the visibility timeout period. If the message isn't processed and has not been deleted in the 30 second timeframe, the message will become visible in the queue again.

Where can Long Polling be enabled in SQS (2)?

Long polling can either be enabled at the queue level, or at the API level using the RecieveMessageWaitTimeSeconds parameter.

SNS, SQS, Kinesis and Integration patterns Flashcards

(57 cards)