Kinesis Flashcards by Jeremy Lim

What is the default shard limit per Kinesis stream?

500 shards per stream

How well did you know this?

Not at all

Perfectly

What is a Kinesis shard?

A shard contains multiple data records, consists of a partition key, sequence number and data payload

How well did you know this?

Not at all

Perfectly

What is the read limits from a Kinesis shard?

5 read transactions/sec or 2 MB data per sec

How well did you know this?

Not at all

Perfectly

What is the write limits to a Kinesis shard?

1000 write transactions/sec or 1MB data per sec

How well did you know this?

Not at all

Perfectly

What is the size limit of a data payload in KDS?

1MB

How well did you know this?

Not at all

Perfectly

How do you scale a KDS?

You add or subtract shards in a process called resharding.

How well did you know this?

Not at all

Perfectly

By default how long is data retained in Kinesis Data Streams?

24 hours

How well did you know this?

Not at all

Perfectly

What is the minimum and maximum retention period of data in Kinesis Data Streams?

24 hours min, 365 days maximum

How well did you know this?

Not at all

Perfectly

What is a partition key in KDS?

Attribute that determines which shard data gets sent to. Same kinesis worker processes 1 shard.

How well did you know this?

Not at all

Perfectly

Why use Kinesis Firehose over KDS?

Firehose automatically scales, is fully managed and integrates directly with AWS services, but is only near realtime, and data storage limited to 24 hours, no replay

KDS is realtime, low latency, for custom application, able to do data storage, replay records but requires custom work to scale/reshard

How well did you know this?

Not at all

Perfectly

What are the 4 main benefits of using KCL?

Kinesis client library allows you to
1. automatically integrate with KPL to de-aggregate records
2. Checkpoints processed records for you
3. Auto balances shard to workers leases if worker or shard counts change
4. Sends custom metrics to CloudWatch automatically

How well did you know this?

Not at all

Perfectly

What languages does KCL support?

KCL is written in Java but allows you to use other runtimes like Python via MultiLangDaemon

How well did you know this?

Not at all

Perfectly

What is a record processor in KCL?

the logic for how data is processed and is instantiated one record processor per shard by a worker

How well did you know this?

Not at all

Perfectly

How many workers are there in KCL?

There is 1 worker per KCL application instance, with 1 or more application instance running in a distributed fashion

How well did you know this?

Not at all

Perfectly

How would you resolve issues with throttling on a shard with multiple consumers?

Since read limits on a shard are per shard, you can enable enhanced fan-out which makes the limit the same for each consumer instead of shared by all consumers.

How well did you know this?

Not at all

Perfectly

Is KPL synchronous or async?

Can use either one with KPL, but async is default and recommended

How well did you know this?

Not at all

Perfectly

How many consumers can there be of a shard?

Multiple consumers can read from a shard

What are the 4 benefits of KPL?

Increases performance by aggregating small records
Provides automatic retry logic if there is record failure
Handles multi-threading, batching, aggregation
Sends metrics to CloudWatch automatically

What is a downside of KPL?

There can be some extra processing delay due to the wrapper code, up to the RecordMaxBufferedTime

What happens in Firehose if a data producer is sending more data than Firehose is able to deliver to S3

The BufferSize will dynamically increase and attempt to catch up with the delivery stream

Does Firehose support KPL de-aggregation from a KDS?

Yes, de-aggregates before delivering to a destination or before Lambda pre-processing

Which Kinesis option supports native S3 Backup integration?

Firehose supports S3 backup of original source data as well as failed data (processing or delivery failure)

What is a common Firehose task?

Converting record formats from JSON to Parquet or ORC, then storing in S3

Can also have a Firehose Lambda to transform source data into JSON first e.g. CSV into JSON

https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html

What are possible data sources for Firehose?

KDS
Kinesis Agent
AWS SDK
CloudWatch
AWS IOT

What are the possible output sources for KDA?

1. KDS 2. Kinesis Firehose 3. Lambda

What are the 2 interfaces to write KDA apps?

SQL Interface or Apache Flink (Java) Interface

Can you write KDA to multiple outputs? What is the limit?

Yes you can write to multiple destinations, up to 3

What are the 3 windowed query types for KDA?

1. Stagger: aggregate as windows open when data arrives. time based windows, reduces late/out of order/inconsistent arrival data 2. Tumbling: aggregate based on windows that open and close on regular intervals, nonoverlapping manner 3. Sliding: fixed time or row count interval, continuous aggregation, overlapping windows

Why use MSK over Kinesis?

1. MSK has unlimited retention period 2. MSK allows greater payload size of 6MB vs Kinesis 1MB

What are possible data sources for KDA Flink? vs. SQL?

1. KDS 2. MSK 1. KDS 2. Firehose

What are the downsides of MSK?

1. Cluster provisioning model 2. 3rd party tooling not integrated with AWS natively 3. Scaling is not seamless to clients

What is the Firehose buffer size min/max for S3 and ES?

1. S3 is 1MB to 128 MB 2. ES is 1MB to 100 MB

What is the Firehose buffer interval?

60 to 900 seconds

What is the payload limit for MSK?

1. 8MB

What is the payload limit for Firehose?

1. 1024 KB or 1 MB

What is the process of resharding in KDS?

1. Merge Shards 2. Split Shards

What are the 5 destinations KDS can write to?

1. Lambda 2. Kinesis Firehose 3. Kinesis Data Analytics 4. KCL 5. Glue Streaming

What are the 3 data sources for KDS?

1. KPL 2. Kinesis Agent 3. PUT to Kinesis API (SDK)

What are the 5 destinations for Kinesis Firehose?

1. Redshift 2. S3 3. OpenSearch aka ES 4. Http Endpoint 5. Vendor Integration

What are the 2 data sources for KDA?

1. KDS 2. Kinesis Firehose

How would you get realtime events from CloudWatch? What are the valid destinations?

1. Use Cloudwatch Logs with Subscription Filters 2. Destinations are Lambda, KDS, Firehose https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Subscriptions.html

What streaming service should you use if you have reference data in S3 that needs to be joined/merged?

Kinesis Data Analytics