SQS, SNS and Kinesis Flashcards

Question

What is the max. standard message size, and how can you send larger than this? What is a use case of this?

Answer 1

Use the SQS Extended Client - If a Producer wants to send large (say, 1GB) messages, it will send the large message to and S3 bucket, and a small metadata message to the SQS queue containing a pointer to the bucket - The consumer will poll for the small metadata message, and retrieve the large message from S3 - Can be used for video processing: the video is uploaded to S3 and a message is sent to the SQS queue

Answer 2

- CreateQueue (attribute: MessageRetentionPeriod), DeleteQueue (along with all message in the queue) - PurgeQueue: delete all messages in the queue - SendMessage (attribute: DelaySeconds), ReceiveMessage, DeleteMessage - MaxNumberOfMessages: default 1, max 10 (for ReceiveMessage API) - ReceiveMessageWaitTimeSeconds: Long Polling - ChangeMessageVisibility: change the message timeout

Answer 3

SendMessage, DeleteMessage, ChangeMessageVisibility

Answer 4

- Exactly-once send capability (by removing duplicates) - Messages are processed in order by the consumer

Answer 5

Limited throughput: 300 msg/s without batching, 3000 with

Answer 6

The name must end in '.fifo'

Answer 7

The time period after a message is received in which duplicate messages will be removed - 5 minutes

Answer 8

- Content-based: will do a SHA-256 hash of the message body - Explicitly provide a Message Deduplication ID

Answer 9

Specify the MessageGroupID to order messages within their groups.

Answer 10

- 'Event Producer' (publisher) sends a message to one SNS topic - 'Event Receiver' (subscriber) subscribes to that topic (meaning multiple subscribers per topic) - Each subscriber will get all the messages in that topic (depending on if there is a filter present or not)

Answer 11

- Email, SMS, HTTP(S) endpoints - SQS, Lambda, Kinesis Data Firehose

Answer 12

- In-flight encryption using HTTPS API - At-rest encryption using KMS keys - Client-side encryption if the client wants to perform en/decryption itself

Answer 13

SNS Access Policies (similar to S3 bucket policies)

Answer 14

SNS & SQS - Push once in SNS, receive in all SQS queues that are subscribers - Fully decoupled, no data loss - SQS allows for data persistence, delayed processing and retries of work - Ability to add more SQS subscribers over time - Cross-Region Delivery: works with SQS queues in other regions

Answer 15

SQS does not have correct access policy allowing SNS to write

Answer 16

You can only have one S3 event rule for the same combination of event type (e.g., object create) and prefix (e.g., images) - Send the S3 event to an SNS topic - Subscribers to that topic will receive the event

Answer 17

SNS can send to Kinesis which can then send data to S3 or any supported KDF destination

Answer 18

- Any subscribing queues must be FIFO as well - Limited throughput (same as SQS FIFO)

Answer 19

A JSON policy used to filter messages sent to SNS topic's subscriptions - A buying service publishes orders an SNS topic - Multiple queues are subscribed, one for placed orders and one for cancelled orders - Use a message filter to only allow 'placed' orders to a certain queue, and 'cancelled' orders to another

Answer 20

KDF, SQS, Lambda, Email, Email-JSON, HTTP(S), SMS shelesk

Answer 21

Collect, process and analyze streaming data in real-time - Application logs, Metrics, Website clickstreams, IoT telemetry data

Answer 22

Data Streams: capture, process and store data streams Data Firehose: load data streams into AWS data stores Data Analytics: analyze data streams with SQL or Apache Flink Video Streams: capture, process and store video streams

Answer 23

A Data Stream in made up of N numbered shards, and can be scaled as desired - The number of shards determines ingestion and consumption rate - Must be provisioned ahead of time, data is split across shards

Answer 24

A Record is composed of a Partition Key (determines which shard will pick it up) and a Data Blob (up to 1mb)

Answer 25

Producers can send data at a rate of 1MB/sec or 1000 msg/sec per shard

Answer 26

Partition key, Sequence no. (unique per partition-key within a stream) and Data blob.

Answer 27

2MB/sec (shared) per shard all consumers OR 2MB/sec (enhanced) per shard per consumer

Answer 28

1 to 365 days

Answer 29

Once inserted, it can't be deleted

Answer 30

AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent

Answer 31

Write your own with: Kinesis Client Library (KCL), AWS SDK Managed: Lambda, Kinesis Data Firehose, Kinesis Data Analytics

Answer 32

- Choose the number of shards provisioned & scale manually or by using API - Each shard gets 1MB/s in (or 1000 msg/sec) - Each shard gets 2MB/s out (classic or enhanced fan-out consumer) - Pay per shard provisioned per hour

Answer 33

- No need to provision or manage capacity - Default capacity provisioned (4 MB/s or 4000 msg/s) - Scales automatically based on observed throughput peak during last 30 days - Pay per stream per hour & data in/out per GB

Answer 34

Yes, VPC endpoints are available for access within the VPC

Answer 35

Monitor using CloudTrail - API calls are available there

Answer 36

The partition key must be specified for the record entered into Kinesis - it is then put through a Hash function and then delivered to a particular shard. - All data with the same resultant hash is delivered to the same shard.

Answer 37

Must use a highly distributed partition key to avoid a 'hot partition' (i.e., 1 shard has more throughput than other shards).

Answer 38

ProvisionedThroughputExceeded - Use highly distributed partition key - Retries with exponential backoff - Increase shards (shard-splitting)

Answer 39

Shared (classic): - 2 MB/s per shard across all consumers - Uses the GetRecords() api call - If 3 applications need to call GetRecords from a single shard, they all shard 2 MB/s of throughput - Max 5 GetRecords API calls/sec - Latency ~200ms, low cost, returns up to 10MB (or up to 10,000) records Enhanced - 2 MB/s per consumer per shard - Uses the SubscribeToShard() api call - Applications subscribe to the shard, and then data is pushed (over HTTP/2 at 2 MB/s regardless of the number of consumers - Latency ~70ms, higher cost - Soft limit of 5 consumer applications (KCL) per data stream

Answer 40

- Supports class and enhanced fan-out consumers - Read records in batches (GetBatch) - Can configure batch size and batch window - Lambda automatically retries on error until success or data expiration - Can process up to 10 batches per shard simultaneously

Answer 41

1: 4 shards = max. 4 KCL (Kinesis client library) instances for example.

Answer 42

Kinesis Client Library - Java lib. that helps read records from a Data Stream with distributed applications sharing the read workload.

Answer 43

When a disproportionate amount if data is sent to a single shard (compared with the others) -> Split the shard to increase capacity -> Old shard is closed and will be deleted once the data is expired

Answer 44

- No automatic scaling (must manually increase/decrease capacity) - Can't split into more than two shards in a single operation

Answer 45

Combine two shards into one - Can be used to group two shards with low traffic (cold shards) - Old shards are closed and will be deleted once the data is expired

Answer 46

S3, Redshift (copy through S3) and OpenSearch

Answer 47

Producers push data to Firehose, which can then optionally transform the data using a Lambda, and batch write to a given destination. - There is also the option to send either All or only Failed data to a backup S3 bucket.

Answer 48

- 60 second latency minimum for non-full batches - Or minimum 1MB of data at a time

Answer 49

Data Streams: - Streaming service for ingest at scale - Write custom code (producer/consumer) - Real-time (~200ms) - Manage scaling (shard splitting/merging) - Data storage for 1 to 365 days - Supports replay capability Data Firehose: - Load streaming data into S3/Redshift/OpenSearch/3rd party/custom HTTP - Fully managed - Near real-time - Auto scaling - No data storage - Doesn't support replay capability

Answer 50

- Data is read from sources such as Data Streams of Data Firehose - Can then apply SQL statements to the data for real-time analytics - Can also apply some reference data during this step from S3 - Data can then be sent either to Data Streams (and on to Lambda/other applications) or to Firehose (and then to S3, Redshift, OpenSearch etc.)

Answer 51

- Timeseries analytics - Real-time dashboards - Real-time metrics

Answer 52

Use Flink (Java, Scala or SQL) to process and analyze streaming data - Receive data from Data Streams or MSK (managed streaming service for apache kafka) - Flink is more powerful than just SQL - Managed service: provision compute resources, parallel computation, automatic scaling - Application backups (implemented as checkpoints and snapshots) - Cannot read from Firehose

Answer 53

Group ID allows messages to be grouped when they are related to each other - Grouped messages can be consumers by different consumers (similar to distributing Records to different shards based on the partition key)

Answer 54

With an example of 100 producers, 5 Kinesis shards, 1 SQS FIFO Data Streams: - 20 producers per shard - Data is ordered within each shard - Max number of consumers in parallel is 5 (1 per shard) - Can recieve up to 5MB/s of data SQS FIFO - 100 group IDs - 100 consumers, 1 group ID per consumer - 300 messages per second (or 3000 with batching)

Answer 55

SQS - Consumers 'pull data' - Data is deleted after being consumed - Can have as many consumers as we want - No need to provision throughput - Ordering guarantees only on FIFO queues - Individual message delay capbility SNS: - Push data to many subscribers - 12.5 million subscribers per topic - Data is not persisted (lost if not delivered) - Pub/sub - No need to provision throughput - Integrates with SQS for fan-out architecture pattern - FIFO capability Kinesis: - Standard: pull data, 2MB/s per shard - Enhanced fan out: push data, 2MB/s per shard per consumer - Possibility to replay data - Meant for real-time big data, analytics, ETL (extract, transform, load) - Ordering at the shard level - Data expires after X days - Provisioned mode or on-demand capacity mode

SQS, SNS and Kinesis Flashcards

(79 cards)