Messaging Flashcards by Artur Zeitler

List some feature of Amazon SQS

Unlimited Throughput

Unlimited number of messages in the queue

default retention of messages, maximum of 14 days

low latency (< 10 ms on publish and receive)

limit of 256 kb per message sent

can have duplicate messages

can have out of orrder messages

at least once delivery

How well did you know this?

Not at all

Perfectly

How does message consumption work?

Consumers are computing entities (EC2, on-premises, Lambda) that poll messages, up to 10 at atime, after processing the message, the consumer deletes the message using the DeleteMessage Api

How well did you know this?

Not at all

Perfectly

SQS with ASG (Auto Scaling Group) -

Which metric is responsible to trigger the scaling?

Queue Length - ApproximateNumberOfMessages

Alarm is set up once limit is triggered scaling the consumers

How well did you know this?

Not at all

Perfectly

SQS and Security

Encryption:

In-flight enryption using HTTPS API

At-rest encryption using KMS Key

Access Controls:

IAM policies to regulate access to the SQS API

SQS Access policies (Similar to S§ bucket policies):

useful for cross-account access

useful for other services (SNS, S3, …) to write to SQS queues

How well did you know this?

Not at all

Perfectly

How do Access Policies work with SQS?

Cross Account access:

Poll messages

specify policy in source account that the consumer in the target account assumes

policy has to specify the account id of target account, the actions to take place, and the arn of the sqs resource

Let S3 write messages to sqs queue

specify in policy the sqs arn, the action to take place, and in the condition section the arn of the s3 bucket as well as the S3 account id

How well did you know this?

Not at all

Perfectly

What is Message invisibilty time out in SQS?

After a message is polled by a consumer it is invisible to other consumers for 30 seconds (default)

If message is not deleted during the timeout it returns to the queue

Hence, if a message is not processed and deleted during the time out, it will be processed twice

The ChangeMessageVisibilty API to increase time out from the consumer-side

too high timeout => slow processing

too small timeout => message is processed multiple times

How well did you know this?

Not at all

Perfectly

What are Dead Letter Queues in SQS?

A threshold can be set on how many times a message is returned to the queue after timeout

After the MaximumReceives threshold is exceeded, the message is put into the dead letter queue

useful for debugging

make sure to process the messages before they expire

set retention to 14 days

How well did you know this?

Not at all

Perfectly

What is a delay queue in SQS?

Delay a message up to 15 minutes until the consumers can see it

Default is 0 seconds

Can be set at queue level

Can override default on send using the DelaySeconds parameter

How well did you know this?

Not at all

Perfectly

What is Long Polling in SQS?

When a consumer requests messags from a queue it can optionally wait until a message arrives

Reduces api calls made to the SQS queue, while improving latency and increasing efficiancy of the application

Wait times between 1 sec to 20 sec

Long Polling is preferred to Short Polling using WaitTimeSeconds

How well did you know this?

Not at all

Perfectly

What is SQS Extended Client?

A java library to increase message size (256kb) up to 1 gb?

A small metadata message is sent to the queue that contains a pointer to the S3 bucket holding the big data

Hence, the large message is sent to and retrieved from S3

How well did you know this?

Not at all

Perfectly

Usefule SQS API calls

CreateQueue (MessageRetentionPeriod), DeleteQueue

PurgeQueue: delete all messages in queue

SendMessage (DelaySeconds), ReceiveMessage, DeleteMessage

MaxNumberOfMessages: default 1, max 10 (for ReceiveMessage Api)

ReceiveMessageWaitTimeSeconds: Long Polling

ChangeMessageVisibility: change the message time out

Batch APIs for SendMessage, DeleteMessage, ChangeMessageVisibility help to reduce costs

How well did you know this?

Not at all

Perfectly

What about the FIFO queue in SQS?

More ordering than in a standard queue

Limited Throughput: 300 msg/s without Batch, 3000 msg/s with batch

Exactly once-send capability, by deleting duplicates - feature of fifio queues

Messages are processed in order by the consumer

How well did you know this?

Not at all

Perfectly

What is Deduplication in a FIFO queue?

If the same message arrives within 5 minutes, the second one is refused

Two methods:

Content-based Deduplication using a message hash with sha256

Explicilty provide a Message Deduplication ID

How well did you know this?

Not at all

Perfectly

What is Message Grouping in a FIFO queue?

If the same value for MessageGroupID is provided then all of these messages will be processed by the same consumer in the order they appear

TO get a level of ordering for a subset of messages then use different values for MessageGroupID

Each Group id can have a different consumer

Ordering across groupd is not guaranteed

Maybe the total ordering of all messages is not needed, only consumer specific

How well did you know this?

Not at all

Perfectly

What is AWS Simple Message Service SMS?

Want to send one message to many receivers

uses Pub/Sub

one topic posts to many subscribers

event producer sends messages to only one SNS topic

as many receivers as we want will listen to the SNS topic notifications

each subscriber to the topic will get all the messages (new filter is implemented)

up to 10 mil subs per topic

100k topics limit

Subscribers can be:

Lambda, SQS, Emails, HTTP/S, SMS messages, Mobile Notifications

How well did you know this?

Not at all

Perfectly

How is SNS integrated into AWS?

Study These Flashcards

Many AWS services can send data directly to SNS

CloudWatch Alarms

AutoScaling Actions

Amazon S3

How to publish a message with SNS?

Study These Flashcards

Topic Publish:

Create a topic

Create at least one sub

Publish to the topic

Direct Publish:

Create a platform app

Create a platform endpoint

Publish to the platform endpoint

Works with Google GCM, Apple APNS, Amazon ADM

Security with SNS

Study These Flashcards

Encryption:

In-flight encryption with HTTPS API

At-rest encryption using KMS keys

Client-side encryption if the client wants to perform encryption/decryption himself

Access Controls:

IAM policies to regulate access to the SNS API

SNS Access Policies:

Usefule for cross-account access

Useful for other services to write to an SNS topic

What is Fan Out in SNS and SQS?

Study These Flashcards

Push once in SNS, receive in all SQS queues that are subs

Fully decoupled, no data loss

SQS allows for: data persistence, delayed processing, and retries of work

Ability to add more SQS subs over time

Make sure your SQS queue access policy allows for SNS to write

AND MORE

What is AWS Kinesis?

Study These Flashcards

Makes it easy to collect, process, analyze streaming data in real time

4 Kinesis Services:

Kinesis Data Streams: capture, process, and store data streams

Kinesis Data Firehose: load data streams into AWS data stores

Kinesis Data analytics: analyze data streams with Apache Flink or SQL

Kinesis Video Streams: no in the exam

How does Kineses Data Stream work?

Study These Flashcards

Producer (Apps, clients, sdks, kinesis agent) write records (partition key, Sequence no, data blob) to shards of Kinesis Data Stream with 1 mb/sec or 1000 msg/sec per shard

The shards write records (partition key, Sequence no. , data blob) to consumers (Apps, Lambda, Kinesis Data Firehose, Kinesis Data Analytics) with either shared 2 mb/sec per shard all consumers or enhanced 2 mb/sec per shard per consumer

Some more info on Kinesis Data Stream

Study These Flashcards

Billed per shard

Can have as many shards as desired

Retention of data between 1 day (default) and 365 days

Ability to reprocess/replay data

Once data is inserted in Kinesis it cant be deleted (immutable)

data in the same partition goes to the same shard (ordering)

Producers: AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent

Consumers:

my own: Kinesis Client Library, SDK

managed: Lambda, Kinesis Firehose, Kinesis Analytics

Kinesis Data Security

Study These Flashcards

Control Access/Authorization using IAM Policies

Encryption in-flights with HTTPS endpoints

Ecryption at rest using KMS

Encryption on client side can be implemented (harder)

VPC endpoints available to access over VPC network

Monitor API calls using CloudTrail

What do Kinesis Producers do?

Study These Flashcards

Put data records into data streams

A data record consists of: partition key, sequence no, data blob (up to 1 mb)

Producers: AWS SDK, Kinesis Producer Library (KPL) c++, java, batch compression retries, Kinesis Agent (monitor log files)

Write Throughput: 1 mb/sec or 1000 records per shard

PutRecord API

Use batching with PutRecords API to reduce costs and increase throughput

ProvisionedThroughputExceeded: too many records in too short a time for one shard, choose partition key wisely, retries with exponential backoff, increase shards

What do Kinesis Consumers do?

Get data records from data streams and process them aws lambda, Kinesis Data Analytics, Kinesis Data Firehose Customer Consumer (AWS SDK) - Classic or enhanced fan out Kinesis Client Library - library to simplify reading from data stream

Kinesis Consumer Types

Shared Fan-out consumer - pull low number of consuming applications Read throughput: 2 mb/sec per shard across all consumers Max: 5 GetRecords Api call per sec Latency around 200 ms cheap COnsumers poll data from shards using GetRecords api call Returns up to 10 mb (then throttles for 5 sec) or up to 1000o records Enhanced fan-out consumer - push 2 mb/sec per consumer/shard latency around 70 ms more expensive Kinesis pushes data to consumers over HTTP/2 - SubscribeToShard API Soft limit of 5 apps (KCL) per data stream (default)

Lmabda as a Kinesis Consumer

SUpports classic and enhanced fan out consumers read records in batches can configure batch size and batch window uses GetBatch api call on error, lambda retries until data expires can process up to 10 batches per shard simultaniuosly

What is the Kinesis Client Library?

A java library that helps distributed apps with the read workload from a Kinesis Data Stream One KCL instance per shard maximum, less is okay Progress is checkpointed to Dynmao DB, requires permissions Track other workers and and share work among shards using dynamo db KCL can run on EC2, Elastic Beanstalk, and on-premises Records are read in order at the shard level Versions: KCL 1.x only shard consumer, KCL 2.x also enhances

Kinesis Shard SPlitting

Used to increase stream capacity (make 2 out of 1 shard e.g.) 1 mb/sec data per shard ? used to divide a hot shard old shard is closed and will be deleted once data expires no automatic scaling can't split into more than 2 shard in one operation

Kinesis Merging Shard

Decrease stream capacity and reduce costs can be used to group two shards with low traffic (cold shards) old shards are closed and will be deleted once data expires cant merge more than two shards in one operation

Kinesis Data Firehose

Can take data from Kinesis Data Stream, but alos all producers, CLoudWatch, ... writes data to Destinations: S3, Redshift (copy through S3), Amazon ElasticSearch or 3rd party destinations, mongo db etc or my own api fully managed service, serverless, autoscaling, no administration pay for data going through firehose near real time, not real time support custom data transformation with lambda can send failed or all data to a backup S3 bucket

Kinesis Data Analytics

Gets Data from Kinesis Data Stream of Data Firehose enables to apply SQL language to data from sources Data Analytics than can send this data to a Stream or Firehose, and thus to all endpoints that these two offer real-time analytics for stream data using SQL fully managed, no servers to provision automatic scaling real-time analytics pay for actual consumption rate can create streams out of the real-time queries use cases: time series analytics, real-time dashboards, real-time metrics

Ordering Data in Kinesis and SQS FIFO

Messaging Flashcards

(33 cards)