AWS Kinesis Flashcards
What is AWS Kinesis
Kinesis is a managed “data streaming” service. It is great for application logs, metrics, IoT, real time big data, streaming processing frameworks (Spark, NiFi)
The data streamed is automatically replicated synchronously to 3 AZ
What are the 3 applications under Kinesis and what are they used for
- Kinesis Streams: low latency streaming ingest at scale
- Kinesis Analytics: perform real-time analytics on streams using SQL
- Kinesis Firehose: load streams into S3, Redshift, ElasticSearch & Splunk
How does Kinesis Streams work
Streams are divided in ordered Shards / Partitions
Producers -> Shards (1,2,3) -> Consumers
Data retention is 24 hours by default, can go up to 365 days
Ability to reprocess / replay data
Multiple applications can consume the same stream
* Real-time processing with scale of throughput
* Once data is inserted in Kinesis, it can’t be deleted (immutability). This is the main difference between SQS and Kinesis. Messages in SQS can be deleted
Explain Kinesis Streams Shards
Two modes for capacity:
* On-demand: no capacity planning, Kinesis scales shards automatically
* Provisioned: you manage the shards over time
- Batching available or send messages one at a time
- The number of shards can evolve over time (reshard / merge)
- Records are ordered per shard
What are Kinesis producers and consumers
Kinesis Producers
* AWS SDK: simple producer
* Kinesis Producer Library (KPL):
batch, compression, retries, C++,
Java
* Kinesis Agent: For EC2s
Kinesis Consumers
* AWS SDK: simple consumer
* Lambda: (through Event source mapping)
* KCL: checkpointing, coordinated reads
What are some limits with Kinesis Data Streams
Producer:
* 1MB/s or 1000 messages/s at write PER SHARD
* “ProvisionedThroughputException” otherwise
Consumer Classic:
* 2MB/s at read PER SHARD across all consumers
* 5 API calls per second PER SHARD across all consumers
Consumer Enhanced Fan-Out:
* 2MB/s at read PER SHARD, PER ENHANCED CONSUMER
* No API calls needed (push model)
Data Retention:
* 24 hours data retention by default
* Can be extended to 365 days