SQS SNS KINESIS Flashcards

1
Q

SQS
Standard Queue overview

default attributes

A

Oldest offering

Unlimited throughput
Short lived: DEFAULT 4 days, MAX 14 days
Low latency, 10ms response on publish/receive
Less than 256KB per message

May have duplicate messages, AT LEAST ONCE DELIVERY

Can be out of order, BEST EFFORT ordering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

SQS messages
Producers
API to send message

A
Producers:
256KB max
Produced to SQS via SDK 
API: *SendMessage*
Default 4 days, max 14 days in queue
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
SQS messages
Consumers
types
how it retrieves messages, how many at a time
what happens to messages
API for deletion, receiving also
A

Consumers need to be applications that run code

  • On premise
  • AWS lambda
  • EC2

Polls- Request messages from SQS, can receive up to 10 messages at a time per poll
API: ReceiveMessages

Consumer will process message, then delete message in Queue
API: DeleteMessageAPI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

SQS Multiple Consumers

AND

ASG and Sqs
-ASG horizontal scaling via which alarm.

A

Can have multiple consumer receive message on parallel.

At least once delivery and best message ordering is applicable due to this parallel organization

Horizontal scaling is done via adding EC2 to consumer group.

Consumers inside ASG, EC2 instances inside ASG will scale according to factor related to Queue length

Queue Length, Attached to CW alarm will allow EC2 scaling to occur via ASG to handle Queue length.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

SQS Decouple Application tiers

Front end and back end decouple

A

Request goes to front end

Front end should send SendMessage to SQS queue

Backend should be created to poll the SQS queue for messages independently, ReceiveMessages will load the tasks onto the backend which will scale according to SQS queue length.

Backend sends final processed files to final destination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

SQS security
Encryption flight
Encryption rest
Client side

Access control

SQS access policies
what is it like
what is it used for

A

Encryption flight- HTTPS API
Encryption rest - KMS
Client side- Client must do this themselves

Access control- IAM policies regulate access to SQS api

SQS access policies (like bucket policies)
Cross account account access to SQS queues
Allows other services to write to SQS queue.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
SQS: Message Visibility timeout 
purpose
how long it lasts
what other consumers see/ can do
after it expires

API to extend
consequence of long / short sqs

A

Messages that are pulled from the Queue to be processed become indivisible so other consumers cannot pull it.

VISIBILITY TIMEOUT: default is 30 seconds, this allows 30 seconds to be processed before it is processed by another consumer.

after 30 seconds it becomes VISIBLE

*** IF NOT PROCESSED IN TIME, IT MAY BE PROCESSED TWICE!

API: ChangeMessageVisibility allows you to allot more time for the message to be processed after a consumer retrieves it via API: ReceiveMessage

Consequences
Long: Reprocessing takes time if crash occurs
Short: May get duplicates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

SQS Dead Letter Queue

A

DLQ
A separate queue is created, this queue can be designated as a dead letter queue.

After multiple failures to be processed in the timeframe of the visibility timeout we trigger additional times in which the same message reaches the Queue.

API: MaximumReceives sets the threshold of the message going back to the queue before it then gets files to a separate DLQ

DLQ will be used for debugging, used for later processing.
These messages will expire like a normal SQS queue, good to set retention of 14 days of this DLQ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
SQS Delay Queue
purpose
Default
max
API: what is it , and what is it used for?
A
Delay messages so consumers cannot see the messages immediately
Default 0
MAX 15 min
^This is for all messages
API:
DelaySeconds- amount to delay API call
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
SQS: developer concepts
LONG POLLING
what kind of polling is default
where is it set
what api
A

Not like Delay Queue, this on consumer side.

Long polling doesn’t return a response until a message arrives in the message queue
Allows less API Load, less expensive

Time can be 1 - 20 seconds
Readrequest will be paused until this time above is passed, then it will ask if messages are on the queue via polling.

Long is preferrable over short

Enabled at
Queue level
API level: WaitTimeSeconds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

SQS: developer concepts

Extended Client

A

Size limit is 256KB, this helps sending bigger messages.
USE: SQS extended client (java library)

Producer wants to send large message, instead you can send a SMALL METADATA MESSAGE which will reference a large message in the amazon S3 Bucket.

Consumer will read from SQS queue, will consume small metadata message which will direct to read data from S3.

ex: Video file processing

You can use the Amazon SQS Extended Client Library for Java to do the following:

Specify whether messages are always stored in Amazon S3 or only when the size of a message exceeds 256 KB

Send a message that references a single message object stored in an S3 bucket

Retrieve the message object from an S3 bucket

Delete the message object from an S3 bucket

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

SQS: developer concepts
API

CreateQueue
(MessageRetentionPeriod)
DeleteQueue

PurgeQueue

SendMessage
(DelaySeconds)
RecieveMessage
DeleteMessage

ReceiveMessageWaitTimeSeconds

ChangeMessageVisibility

BATCH: SendMessage, DeleteMessage, ChangeMessageVisibility

MaximumReceives

ReceiveMessages
DeleteMessageAPI

A

CreateQueue - create quue
(MessageRetentionPeriod)- set how long messages are retained
DeleteQueue- Delete entire queue contents and name

PurgeQueue- delete messages in queue

SendMessage- as producer send message
(DelaySeconds)- delay for each message
RecieveMessage - consumer can use this to receive
DeleteMessage - consumer wants to delete processed message

ReceiveMessageWaitTimeSeconds: For long polling, wait for receiving messages if queue empty

ChangeMessageVisibility: more time to process, change visibility timeout

BATCH: SendMessage, DeleteMessage, ChangeMessageVisibility. Help decrease costs, batch for request .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

AWS SQS FIFO QUEUE
overview
speed

A

First in First Out

Ordering of Messages exact
Batch 3000, unbatched 300 per second
Exactly sent once
Processed in order.

Decouple, also need to maintain order with throughput constraint.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

AWS SQS FIFO QUEUE
Deduplication

interval
2 types of rejection.

A

Deduplication interval 5 minutes
- Same message twice within 5 minutes will cause 2nd to be rejected
two methods

  1. Content based: SHA-256 Hash of body will match and be rejected
  2. Message dedup ID, if same ID is encountered in 5 minutes then message is dropped
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

AWS SQS FIFO QUEUE
Message Grouping
one consumer
groupings

A

MessageGroupID: Mandatory paramter, if you specify one value then all messages will be sent to one consumer

Grouping level Subset of messages:
Specify different values for MessageGroupID
-grouped by same MessagegroupID
-Each separate ID will have sep consumer
-ORDERING IS NOT GARUNTEED IN THIS BETWEEN GROUPS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
SNS overview
Event Producer:
Event Receiver: 
Subscriptions
Subscribers can be sent messages via
A

One Message many receivers
Direct integration: one to many is cumbersome
Pub/Sub: One to a topic, people will subscribe to this topic and adding more just needs to allow more subscriptions

Event Producer: Send message to SNS topic
Event Receiver: Subscriptions, listen into SNS topic notifications, VERY highly scalable.

Subscribers can be
SQS
HTTP HTTPS
Lambda
Email
SMS 
Mobile
17
Q

SNS Integration with services

examples of some

A
Cloudwatch: alarms
Autoscaling group: notification of changes
S3: bucket events
Cloudformation: state changes
etc
18
Q

SNS publishing
TOPIC publish
DIRECT publish

A

TOPIC publish to SNS (USE SDK)

  • create a topic
  • create subscription, or many
  • publish to topic
DIRECT publish (For mobile apps SDK)
Create platform application
create platform endpoint
publish to platform endpoint

Works with third party tools to receive notifications.

19
Q

SNS Security
encryption types
access control
SNS ACCESS POLICIES:

A

Similar to SQS
In flight by default HTTPS
At rest KMS
Client side , operations need to be done themselves

Access controls: IAM policies to regulate access to SNS API

SNS ACCESS POLICIES: like s3 bucket policies

  1. good for cross account access to SNS topics
  2. good for access to other services like S3 to write to SNS topic.
20
Q

SNS SQS fanout pattern

process
why is it used
what kind of SQS queues can this NOT work for.

A

Send same message to many different SQS queues:

-> Push message to SNS, have SQS queues subscribe to service so one message reaches multiple parallel SQS queues.

Full Decoupled, NO DATA LOSS

  • Used for : DATA persistence, delay processing, delayed processing and retries
  • CAN add more SQS subs over time.
  • SQS queue needs access policy for SNS to write

SNS CANNOT SEND MESSAGES TO FIFO QUEUE

21
Q

KINESIS OVERVIEW
Good for
what kind of data
availability

A

Great for Application Logs, Metric, IOT, clickstream
REAL TIME BIG DATA

Good for streaming processing framework

AUTOMATICALLY replicated to 3 AZ’s

22
Q

Kinesis products overview

  1. Kinesis stream
  2. Kinesis Analytics
  3. Kinsesis Firehose
A

main focus on 1.

  • **1. Kinesis stream- Low latency streaming INGEST
    2. Kinesis Analytics: Real time analytics on streams with SQL
    3. Kinsesis Firehose: Load streams into S3, DynamoDB, and ElasticSearch
23
Q

Kinesis diagram overview

flow of data to storage

A
  1. Data flows into Kinesis streams
  2. streams loads into Kinesis analytics to process
  3. after processing, the end product is loaded into kinesis firehose to endpoint data storage
24
Q

KINESIS STREAMS

Shards
Default Shard time, MAX time
IMMUTABLE data

A

Producers of data that are using Streams will load data into a scalable shard system. This shard system can add more shards for more data

*Shards can only last 1 day DEFAULT, 7 Day MAX
you should want to process this data quick

*Ability to REPROCESS and REPLAY data

*Multiple applications can consume this stream
SCALABLE CONSUMERS

*IMMUTABLE, ONCE inserted , the data cannot be deleted

25
Q
KINESIS STREAMS SHARD
WRITE SPEED and throughput
READ speed
Shard scaling
ordering
A

One Shard is 1 MB/S or 1000MB/s on WRITE
One shard is 2 MB/s on READ

Billed per shard, can have almost endless shards
can be BATCHED message or calls
Shard number can EVOLVE over time

RECORDS are ORDERED per shard
Many shards, data is also ordered PER shard

26
Q

AWS Kinsesis PUT RECORDS: Stream API

Putrecord api
Message Key
Partition Key
Sequence number
partition key cardinality
A

PutRecord API sends data to Kinesis

PutRecord API + Partition key is hashed to determine Shard ID.

DATA is grouped with a Message Key, Message key will be a string you determine, this gets hashed to determine shard ID

Messages that are sent, as they are sent generate a sequence number, more sent means higher number

Partition key: Must be distributed, AVOIDS hot partition. more unique partition keys will allow more distribution, avoid HOT shard.

*BATCHING with PutRecords can reduce costs and increase throughput

PROVISIONEDTHROUGHPUTEXCEEDED if going over limits, can use retry and exponential backoff.

Can use CLI, AWS SDK, or producer libraries

27
Q

AWS KINSESIS API EXCEPTIONS
issues
solutions

A

PROVISIONEDTHROUGHPUTEXCEEDED
Happens when

More Data than able, Exceed MB/s or transactions per second
Make sure no HOT SHARD, avoid too much data per partition

Retries with back off
increase shards via scaling
ensure partition key is good and high cardinality

28
Q

AWS KINSESIS API CONSUMERS

A
  1. Normal Consumer:
    CLI SDK

2 KINSESIS CLIENT LIBRARY
Java node python ruby net
KCL ENABLES TO ENABLE TO CONSUME FROM KINESIS EFFICIENTLY

What is the Kinesis Client Library? KCL helps you consume and process data from a Kinesis data stream by taking care of many of the complex tasks associated with distributed computing.

29
Q

KINESIS KCL in depth

how it helps shard consumption
the infrastructure and limits
how its coordinated.

A

Kinesis Client Library:
Java Library that helps read records
Via distributed applications that share read workload

  • EACH Shard is going to be read by one KCL instance. Shards can share a KCL instance, however a KCL instance cant process two shard
  • Records are read in order at shard level, but between shards there is no order
  • Can run on Ec2/ Elastic Beanstalk/ On premise
  • progress of how things are being processed are checkpointed in DynamoDB, will need IAM access to write to it.

EXAMPLE 6 Shard and 4 KCL instances

Shard 0 - 1kcl
shard 1 - 2kcl
shard 3 /4 - 3kcl
shard 4 /5 - 4 kcl

Example 2 and 2
shard 0 1-kcl
shard 1 2-kcl

30
Q

KINESIS SECURITY

A

Control Access/authorization IAM policies
Encryption Flight HTTPS, rest KMs
possible to client side encrypt, harder than others

PRIVATELINK: VPC endpoints are available for kinesis to access within VPC

31
Q

KINESIS DATA ANALYTICS:
managed by
cost
delay

A

Real time analytics on Kinesis streams using SQL

Auto scaling automatic
Managed, no servers
Continuous real time
*No delay in consume / compute metrics

NOT LIKE KINESIS STREAMS, NO NEED TO PROVISION THROUGHPUT

*NEW stream can be generated out of real-time query

pay for consumption rate

32
Q

KINESIS FIRE HOSE
managed by
cost
delay

A

Fully Managed, no admin needed
Near real time, 60 seconds latency
Load into
S3, DynamoDb, Elasticsearch, spluunk

Automatic scaling

Support many data formats

PAY FOR AMOUNT GOING TRHOUGH FIREHOSE

33
Q
SNS
VS 
SQS
vs
KINESIS
A

SNS:
Push Data to subscribers
Data is not persistent, lost if not delivered
FAN out capable

SQS:
Consumers Pull Data
As many consumers as required
Data is deleted after consumption

KINESIS
Consumers pull data
Streaming data real time, 
REAL TIME BIG DATA, ANALYTICS, ETL
as many consumers as you want, ONE CONSUMER PER SHARD
possible to replay data
Must provision throughput
34
Q

KINESIS DATA ORDERING

partition key

A

Example: Trucks with ID numbers

Partition key will be used for ordering, in this case its TRUCK ID

*SAME KEY GOES TO SAME SHARD

Data plus Partition key will be hashed, form this it will send the data to a specific shard. Based on hash and the partition key the items will go to the same shard over time.

As trucks are assigned to different shards, the new data will be linked to the partition key which will go to a specific shard. this will continue on the same shard always

35
Q

SQS ordering FIFO

Group ID

A

Normally order of data is order of consumption, FIFO

Group ID will act like partition key, messages will be grouped when related to each other. This allows different groups to be consumed by separate consumers.

36
Q

KINESIS VS SQS ordering

A

EXAMPLE
100 Trucks 5 Shards 1 FIFO Queue

Kinsesis
20 trucks per shard
data ordered within each shard
MAx amount of consumers is 5
throughput is 5MBPS read reception
SQS FIFO
One FIFO
Create 100 group ID
allows 100 consumers , one for each ID
up to 300 messages per second, 3000 if batched.