SQS SNS KINESIS Flashcards by Sam Hammond

SQS
Standard Queue overview

default attributes

Oldest offering

Unlimited throughput
Short lived: DEFAULT 4 days, MAX 14 days
Low latency, 10ms response on publish/receive
Less than 256KB per message

May have duplicate messages, AT LEAST ONCE DELIVERY

Can be out of order, BEST EFFORT ordering

How well did you know this?

Not at all

Perfectly

SQS messages
Producers
API to send message

Producers:
256KB max
Produced to SQS via SDK 
API: *SendMessage*
Default 4 days, max 14 days in queue

How well did you know this?

Not at all

Perfectly

SQS messages
Consumers
types
how it retrieves messages, how many at a time
what happens to messages
API for deletion, receiving also

Consumers need to be applications that run code

On premise
AWS lambda
EC2

Polls- Request messages from SQS, can receive up to 10 messages at a time per poll
API: ReceiveMessages

Consumer will process message, then delete message in Queue
API: DeleteMessageAPI

How well did you know this?

Not at all

Perfectly

SQS Multiple Consumers

AND

ASG and Sqs
-ASG horizontal scaling via which alarm.

Can have multiple consumer receive message on parallel.

At least once delivery and best message ordering is applicable due to this parallel organization

Horizontal scaling is done via adding EC2 to consumer group.

Consumers inside ASG, EC2 instances inside ASG will scale according to factor related to Queue length

Queue Length, Attached to CW alarm will allow EC2 scaling to occur via ASG to handle Queue length.

How well did you know this?

Not at all

Perfectly

SQS Decouple Application tiers

Front end and back end decouple

Request goes to front end

Front end should send SendMessage to SQS queue

Backend should be created to poll the SQS queue for messages independently, ReceiveMessages will load the tasks onto the backend which will scale according to SQS queue length.

Backend sends final processed files to final destination.

How well did you know this?

Not at all

Perfectly

SQS security
Encryption flight
Encryption rest
Client side

Access control

SQS access policies
what is it like
what is it used for

Encryption flight- HTTPS API
Encryption rest - KMS
Client side- Client must do this themselves

Access control- IAM policies regulate access to SQS api

SQS access policies (like bucket policies)
Cross account account access to SQS queues
Allows other services to write to SQS queue.

How well did you know this?

Not at all

Perfectly

SQS: Message Visibility timeout 
purpose
how long it lasts
what other consumers see/ can do
after it expires

API to extend
consequence of long / short sqs

Messages that are pulled from the Queue to be processed become indivisible so other consumers cannot pull it.

VISIBILITY TIMEOUT: default is 30 seconds, this allows 30 seconds to be processed before it is processed by another consumer.

after 30 seconds it becomes VISIBLE

*** IF NOT PROCESSED IN TIME, IT MAY BE PROCESSED TWICE!

API: ChangeMessageVisibility allows you to allot more time for the message to be processed after a consumer retrieves it via API: ReceiveMessage

Consequences
Long: Reprocessing takes time if crash occurs
Short: May get duplicates.

How well did you know this?

Not at all

Perfectly

SQS Dead Letter Queue

DLQ
A separate queue is created, this queue can be designated as a dead letter queue.

After multiple failures to be processed in the timeframe of the visibility timeout we trigger additional times in which the same message reaches the Queue.

API: MaximumReceives sets the threshold of the message going back to the queue before it then gets files to a separate DLQ

DLQ will be used for debugging, used for later processing.
These messages will expire like a normal SQS queue, good to set retention of 14 days of this DLQ

How well did you know this?

Not at all

Perfectly

SQS Delay Queue
purpose
Default
max
API: what is it , and what is it used for?

Delay messages so consumers cannot see the messages immediately
Default 0
MAX 15 min
^This is for all messages
API:
DelaySeconds- amount to delay API call

How well did you know this?

Not at all

Perfectly

SQS: developer concepts
LONG POLLING
what kind of polling is default
where is it set
what api

Not like Delay Queue, this on consumer side.

Long polling doesn’t return a response until a message arrives in the message queue
Allows less API Load, less expensive

Time can be 1 - 20 seconds
Readrequest will be paused until this time above is passed, then it will ask if messages are on the queue via polling.

Long is preferrable over short

Enabled at
Queue level
API level: WaitTimeSeconds.

How well did you know this?

Not at all

Perfectly

SQS: developer concepts

Extended Client

Size limit is 256KB, this helps sending bigger messages.
USE: SQS extended client (java library)

Producer wants to send large message, instead you can send a SMALL METADATA MESSAGE which will reference a large message in the amazon S3 Bucket.

Consumer will read from SQS queue, will consume small metadata message which will direct to read data from S3.

ex: Video file processing

You can use the Amazon SQS Extended Client Library for Java to do the following:

Specify whether messages are always stored in Amazon S3 or only when the size of a message exceeds 256 KB

Send a message that references a single message object stored in an S3 bucket

Retrieve the message object from an S3 bucket

Delete the message object from an S3 bucket

How well did you know this?

Not at all

Perfectly

SQS: developer concepts
API

CreateQueue
(MessageRetentionPeriod)
DeleteQueue

PurgeQueue

SendMessage
(DelaySeconds)
RecieveMessage
DeleteMessage

ReceiveMessageWaitTimeSeconds

ChangeMessageVisibility

BATCH: SendMessage, DeleteMessage, ChangeMessageVisibility

MaximumReceives

ReceiveMessages
DeleteMessageAPI

CreateQueue - create quue
(MessageRetentionPeriod)- set how long messages are retained
DeleteQueue- Delete entire queue contents and name

PurgeQueue- delete messages in queue

SendMessage- as producer send message
(DelaySeconds)- delay for each message
RecieveMessage - consumer can use this to receive
DeleteMessage - consumer wants to delete processed message

ReceiveMessageWaitTimeSeconds: For long polling, wait for receiving messages if queue empty

ChangeMessageVisibility: more time to process, change visibility timeout

BATCH: SendMessage, DeleteMessage, ChangeMessageVisibility. Help decrease costs, batch for request .

How well did you know this?

Not at all

Perfectly

AWS SQS FIFO QUEUE
overview
speed

First in First Out

Ordering of Messages exact
Batch 3000, unbatched 300 per second
Exactly sent once
Processed in order.

Decouple, also need to maintain order with throughput constraint.

How well did you know this?

Not at all

Perfectly

AWS SQS FIFO QUEUE
Deduplication

interval
2 types of rejection.

Deduplication interval 5 minutes
- Same message twice within 5 minutes will cause 2nd to be rejected
two methods

Content based: SHA-256 Hash of body will match and be rejected
Message dedup ID, if same ID is encountered in 5 minutes then message is dropped

How well did you know this?

Not at all

Perfectly

AWS SQS FIFO QUEUE
Message Grouping
one consumer
groupings

MessageGroupID: Mandatory paramter, if you specify one value then all messages will be sent to one consumer

Grouping level Subset of messages:
Specify different values for MessageGroupID
-grouped by same MessagegroupID
-Each separate ID will have sep consumer
-ORDERING IS NOT GARUNTEED IN THIS BETWEEN GROUPS

How well did you know this?

Not at all

Perfectly

SNS overview
Event Producer:
Event Receiver: 
Subscriptions
Subscribers can be sent messages via

Study These Flashcards

One Message many receivers
Direct integration: one to many is cumbersome
Pub/Sub: One to a topic, people will subscribe to this topic and adding more just needs to allow more subscriptions

Event Producer: Send message to SNS topic
Event Receiver: Subscriptions, listen into SNS topic notifications, VERY highly scalable.

Subscribers can be
SQS
HTTP HTTPS
Lambda
Email
SMS 
Mobile

SNS Integration with services

examples of some

Study These Flashcards

Cloudwatch: alarms
Autoscaling group: notification of changes
S3: bucket events
Cloudformation: state changes
etc

SNS publishing
TOPIC publish
DIRECT publish

Study These Flashcards

TOPIC publish to SNS (USE SDK)

create a topic
create subscription, or many
publish to topic

DIRECT publish (For mobile apps SDK)
Create platform application
create platform endpoint
publish to platform endpoint

Works with third party tools to receive notifications.

SNS Security
encryption types
access control
SNS ACCESS POLICIES:

Study These Flashcards

Similar to SQS
In flight by default HTTPS
At rest KMS
Client side , operations need to be done themselves

Access controls: IAM policies to regulate access to SNS API

SNS ACCESS POLICIES: like s3 bucket policies

good for cross account access to SNS topics
good for access to other services like S3 to write to SNS topic.

SNS SQS fanout pattern

process
why is it used
what kind of SQS queues can this NOT work for.

Study These Flashcards

Send same message to many different SQS queues:

-> Push message to SNS, have SQS queues subscribe to service so one message reaches multiple parallel SQS queues.

Full Decoupled, NO DATA LOSS

Used for : DATA persistence, delay processing, delayed processing and retries
CAN add more SQS subs over time.
SQS queue needs access policy for SNS to write

SNS CANNOT SEND MESSAGES TO FIFO QUEUE

KINESIS OVERVIEW
Good for
what kind of data
availability

Study These Flashcards

Great for Application Logs, Metric, IOT, clickstream
REAL TIME BIG DATA

Good for streaming processing framework

AUTOMATICALLY replicated to 3 AZ’s

Kinesis products overview

Kinesis stream
Kinesis Analytics
Kinsesis Firehose

Study These Flashcards

main focus on 1.

**1. Kinesis stream- Low latency streaming INGEST
2. Kinesis Analytics: Real time analytics on streams with SQL
3. Kinsesis Firehose: Load streams into S3, DynamoDB, and ElasticSearch

Kinesis diagram overview

flow of data to storage

Study These Flashcards

Data flows into Kinesis streams
streams loads into Kinesis analytics to process
after processing, the end product is loaded into kinesis firehose to endpoint data storage

KINESIS STREAMS

Shards
Default Shard time, MAX time
IMMUTABLE data

Study These Flashcards

Producers of data that are using Streams will load data into a scalable shard system. This shard system can add more shards for more data

*Shards can only last 1 day DEFAULT, 7 Day MAX
you should want to process this data quick

*Ability to REPROCESS and REPLAY data

*Multiple applications can consume this stream
SCALABLE CONSUMERS

*IMMUTABLE, ONCE inserted , the data cannot be deleted

``` KINESIS STREAMS SHARD WRITE SPEED and throughput READ speed Shard scaling ordering ```

One Shard is 1 MB/S or 1000MB/s on WRITE One shard is 2 MB/s on READ Billed per shard, can have almost endless shards can be BATCHED message or calls Shard number can EVOLVE over time RECORDS are ORDERED per shard Many shards, data is also ordered PER shard

AWS Kinsesis PUT RECORDS: Stream API ``` Putrecord api Message Key Partition Key Sequence number partition key cardinality ```

PutRecord API sends data to Kinesis PutRecord API + Partition key is hashed to determine Shard ID. DATA is grouped with a Message Key, Message key will be a string you determine, this gets hashed to determine shard ID Messages that are sent, as they are sent generate a *sequence number*, more sent means higher number Partition key: Must be distributed, AVOIDS hot partition. more unique partition keys will allow more distribution, avoid HOT shard. *BATCHING with PutRecords can reduce costs and increase throughput PROVISIONEDTHROUGHPUTEXCEEDED if going over limits, can use retry and exponential backoff. Can use CLI, AWS SDK, or producer libraries

AWS KINSESIS API EXCEPTIONS issues solutions

PROVISIONEDTHROUGHPUTEXCEEDED Happens when More Data than able, Exceed MB/s or transactions per second Make sure no HOT SHARD, avoid too much data per partition Retries with back off increase shards via scaling ensure partition key is good and high cardinality

AWS KINSESIS API CONSUMERS

1. Normal Consumer: CLI SDK 2 KINSESIS CLIENT LIBRARY Java node python ruby net *KCL ENABLES TO ENABLE TO CONSUME FROM KINESIS EFFICIENTLY* What is the Kinesis Client Library? KCL helps you consume and process data from a Kinesis data stream by taking care of many of the complex tasks associated with distributed computing.

KINESIS KCL in depth how it helps shard consumption the infrastructure and limits how its coordinated.

Kinesis Client Library: Java Library that helps read records Via distributed applications that share read workload * EACH Shard is going to be read by one KCL instance. Shards can share a KCL instance, however a KCL instance cant process two shard * Records are read in order at shard level, but between shards there is no order * Can run on Ec2/ Elastic Beanstalk/ On premise * progress of how things are being processed are checkpointed in DynamoDB, will need IAM access to write to it. EXAMPLE 6 Shard and 4 KCL instances Shard 0 - 1kcl shard 1 - 2kcl shard 3 /4 - 3kcl shard 4 /5 - 4 kcl Example 2 and 2 shard 0 1-kcl shard 1 2-kcl

KINESIS SECURITY

Control Access/authorization IAM policies Encryption Flight HTTPS, rest KMs possible to client side encrypt, harder than others PRIVATELINK: VPC endpoints are available for kinesis to access within VPC

KINESIS DATA ANALYTICS: managed by cost delay

Real time analytics on Kinesis streams using SQL Auto scaling automatic Managed, no servers Continuous real time *No delay in consume / compute metrics NOT LIKE KINESIS STREAMS, NO NEED TO PROVISION THROUGHPUT *NEW stream can be generated out of real-time query pay for consumption rate

KINESIS FIRE HOSE managed by cost delay

Fully Managed, no admin needed Near real time, 60 seconds latency Load into S3, DynamoDb, Elasticsearch, spluunk Automatic scaling Support many data formats PAY FOR AMOUNT GOING TRHOUGH FIREHOSE

``` SNS VS SQS vs KINESIS ```

SNS: Push Data to subscribers Data is not persistent, lost if not delivered FAN out capable SQS: Consumers Pull Data As many consumers as required Data is deleted after consumption ``` KINESIS Consumers pull data Streaming data real time, REAL TIME BIG DATA, ANALYTICS, ETL as many consumers as you want, ONE CONSUMER PER SHARD possible to replay data Must provision throughput ```

KINESIS DATA ORDERING | partition key

Example: Trucks with ID numbers Partition key will be used for ordering, in this case its TRUCK ID *SAME KEY GOES TO SAME SHARD Data plus Partition key will be hashed, form this it will send the data to a specific shard. Based on hash and the partition key the items will go to the same shard over time. As trucks are assigned to different shards, the new data will be linked to the partition key which will go to a specific shard. this will continue on the same shard always

SQS ordering FIFO | Group ID

Normally order of data is order of consumption, FIFO Group ID will act like partition key, messages will be grouped when related to each other. This allows different groups to be consumed by separate consumers.

KINESIS VS SQS ordering

EXAMPLE 100 Trucks 5 Shards 1 FIFO Queue ``` Kinsesis 20 trucks per shard data ordered within each shard MAx amount of consumers is 5 throughput is 5MBPS read reception ``` ``` SQS FIFO One FIFO Create 100 group ID allows 100 consumers , one for each ID up to 300 messages per second, 3000 if batched. ```

SQS SNS KINESIS Flashcards

(36 cards)