[DEVELOPER] Advanced Kinesis Flashcards by jake reinhart

What is the retention period for Kinesis Data Streams?

1 - 365 days

How well did you know this?

Not at all

Perfectly

How can you delete data in Kinesis Data Streams without processing it?

You can’t. Kinesis Data Streams data is immutable

How well did you know this?

Not at all

Perfectly

How can you ensure record ordering in Kinesis Data Streams?

Use the same partition key. Data that shares the same parition key go to the same shard.

How well did you know this?

Not at all

Perfectly

What is the I/O performance of Kinesis Data Streams Provisioned Capacity Mode?

Each shard gets:
- 1 MB/s in (or 1000 records per second)
- 2 MB/s out

How well did you know this?

Not at all

Perfectly

What is the pricing model for Kinesis Data Streams Provisioned Capacity Mode?

You pay per shard provisioned per hour

How well did you know this?

Not at all

Perfectly

What info does Kinesis Data Streams On-Demand Capacity Mode use to determine how it sets the capacity?

You get automatic scaling based on observed throughput peak during the last 30 days.

Default is 4 MB/s

How well did you know this?

Not at all

Perfectly

What is pricing model for Kinesis Data Streams On-Demand Capacity Mode?

Pay per stream per hour & data in/out per GB

How well did you know this?

Not at all

Perfectly

In what use case would you use Kinesis Data Streams Provisioned Mode over On-Demand Mode?

When you know your capacity ahead of time.

How well did you know this?

Not at all

Perfectly

Suppose you want to send streaming data from a VPC endpoint but you don’t want to go through the internet. How can you accomplish this?

Use Kinesis! VPC endpoints are available for Kinesis to access within VPC

How well did you know this?

Not at all

Perfectly

How does encryption work for Kinesis Data Streams?

Encryption at rest using KMS
Encryption in flight using HTTPS

How well did you know this?

Not at all

Perfectly

What is the API Call for a producer to send a record to Kinesis Data Streams?

PutRecord

How well did you know this?

Not at all

Perfectly

You are using Kinesis Data Streams with multiple producers and multiple shards and repeatedly get ProvisionedThroughputExceeded errors on an individual shard. What can you do to address the problem?

Use highly distributed partition keys, maybe you have a hot partition getting too many messages
Implement exponential backoff with retries
Increase the number of shard (shard splitting)

How well did you know this?

Not at all

Perfectly

You are using Kinesis Data Streams with 4 consumers all reading from the same shard in the Shared (Classic) Fan-out consumer pattern. What is the read throughput of each consumer?

0.5 MB/sec

Classic KDS Fan-out is 2MB/s per shard across all consumers

How well did you know this?

Not at all

Perfectly

You are using Kinesis Data Streams with 4 consumers all reading from the same shard in the Enhanced Fan-out consumer pattern. What is the read throughput of each consumer?

2 MB/secc

Enhanced KDS fan-out is 2MB/s per shard per consumer

How well did you know this?

Not at all

Perfectly

How is data transferred from shard to consumer in the Kinesis Data Streams Standard (Classic) Fan-out consumer pattern?

Consumers poll data from Kinesis using GetRecords API call

How well did you know this?

Not at all

Perfectly

How is data transferred from shard to consumer in the Kinesis Data Streams Enhanced Fan-out consumer pattern?

Consumers use SubscribeToShard API and Kinesis pushes data to consumers over HHTP/2

When would you prefer the Enhanced Fan-Out Consumer pattern over the Standard Fan-Out Consumer pattern for Kinesis Data Streams?

Enhanced is better for
- Lots of consuming applications for the same shard
- Lower Latency (70ms vs. 200ms for standard)

What is the default limit for the number of consumer applications for Kinesis Data Streams Enhanced Fan-Out Consumer pattern?

5 consumers per stream BUT you can raise this with an AWS support ticket.

Does the Standard Fan-Out pattern for Kinesis Data Streams support Lambda consumers?

Yes

Does the Enhanced Fan-Out pattern for Kinesis Data Streams support Lambda consumers?

Yes

Does the Enhanced Fan-Out pattern for Kinesis Data Streams support batch reads for Lambda consumers?

Yes

What does KCL stand for?

Kinesis Client Library

You are using KCL to read from a Kinesis Data Stream with 4 shards into DynamoDB. What is the maximum number of KCL instances you can use?

4 (same as the number of shards)

In the context of Kinesis Data Streams, what is shard splitting?

A method to increase KDS capacity (and cost) by splitting traffic to a shard into 2 new shards.

What happens to data in the old shard once a shard is split in Kinesis Data Streams?

consumers can still read from the shard, the shard will not be deleted until all the data has expired.

What is the maximum number of new shards you can create from a single shard in one shard-splitting operation?

Suppose you have an application in Kinesis Data Streams that is getting less traffic than expected and you want to decrease capacity and cost. How might you do this?

Use **shard merging** to scale down KDS, merging two shards with low traffic

What are the allowed AWS Destinations for Kinesis Data Firehose?

- S3 - Redshift (by copying through S3) - Amazon OpenSearch

What is the maximum size of the data blob of a single record in Kinesis Data Streams?

1 MB

What is the pricing model for Kinesis Data Firehose?

Pay for **data** going through Firehose

Suppose you are handling a lot of data and need to write it to Splunk in near real-time. This data is sensitive so you would also like to have a backup of all data in an S3 bucket. What AWS service would you use for this?

Kinesis Data Firehose

Which Kinesis Service(s) allow(s) you to write your own custom producer and consumer code?

Kinesis Data Streams (Firehose is Fully managed)

Which Kinesis Service(s) allow(s) replay capability?

Kinesis Data Streams (Firehose has no data storage and thus no replay)

Which Kinesis Service(s) allow(s) for automatic scaling?

Kinesis Data Firehose (Streams has to use manual splitting/merging)

Which Kinesis Service(s) process their data in real time?

Kinesis Data Streams (Firehose is _near real-time_)

In Kinesis Data Streams, is data from the same shard guaranteed to all have the same partition key?

**No** partition key is _hashed_ to a shard, so you can have the multiple partition keys hash to the same shard

Suppose you have data from multiple sources in a producer/consumer model. Data from each source must be processed in order. There is a lot of data so you want to maximize throughput. What is the best service to use?

Kinesis Data Streams (SQS FIFO with groupIDs will work but may not have optimum performance)

Suppose you have data from multiple sources in a producer/consumer model. Data from each source must be processed in order. You want to have as many consumers as possible to read the data so you can get maximum parallelism, and you want to keep costs low. What is the best service to use?

SQS FIFO (Kinesis Data Streams will work but may be more expensive, and you are limited to the number of consumers <= number of shards)