Kinesis Flashcards
What is the default shard limit per Kinesis stream?
500 shards per stream
What is a Kinesis shard?
A shard contains multiple data records, consists of a partition key, sequence number and data payload
What is the read limits from a Kinesis shard?
5 read transactions/sec or 2 MB data per sec
What is the write limits to a Kinesis shard?
1000 write transactions/sec or 1MB data per sec
What is the size limit of a data payload in KDS?
1MB
How do you scale a KDS?
You add or subtract shards in a process called resharding.
By default how long is data retained in Kinesis Data Streams?
24 hours
What is the minimum and maximum retention period of data in Kinesis Data Streams?
24 hours min, 365 days maximum
What is a partition key in KDS?
Attribute that determines which shard data gets sent to. Same kinesis worker processes 1 shard.
Why use Kinesis Firehose over KDS?
Firehose automatically scales, is fully managed and integrates directly with AWS services, but is only near realtime, and data storage limited to 24 hours, no replay
KDS is realtime, low latency, for custom application, able to do data storage, replay records but requires custom work to scale/reshard
What are the 4 main benefits of using KCL?
Kinesis client library allows you to
1. automatically integrate with KPL to de-aggregate records
2. Checkpoints processed records for you
3. Auto balances shard to workers leases if worker or shard counts change
4. Sends custom metrics to CloudWatch automatically
What languages does KCL support?
KCL is written in Java but allows you to use other runtimes like Python via MultiLangDaemon
What is a record processor in KCL?
the logic for how data is processed and is instantiated one record processor per shard by a worker
How many workers are there in KCL?
There is 1 worker per KCL application instance, with 1 or more application instance running in a distributed fashion
How would you resolve issues with throttling on a shard with multiple consumers?
Since read limits on a shard are per shard, you can enable enhanced fan-out which makes the limit the same for each consumer instead of shared by all consumers.
Is KPL synchronous or async?
Can use either one with KPL, but async is default and recommended