Decoupling Applications - Kinesis/ Data Streams/ Firehose Flashcards
What is Amazon Kinesis Data Streams (KDS)?
A fully managed service for real-time streaming of big data, allowing producers to send and consumers to process data at scale.
What are Shards in Kinesis Data Streams?
Units of capacity in a Kinesis Data Stream that determine:
Ingestion: 1 MB/s or 1,000 records/s per shard.
Consumption: 2 MB/s per shard (shared across consumers).
Streams are made up of multiple shards, and the number of shards must be provisioned in Provisioned Mode.
What are Producers in KDS?
Entities that send data to the stream. Examples include:
Applications (via AWS SDK or Kinesis Producer Library - KPL).
Kinesis Agent for log streaming.
IoT devices or servers.
What are Consumers in KDS?
Applications or services that read data from the stream. Examples include:
Custom Consumers: Using Kinesis Client Library (KCL) or SDK.
AWS Managed Services:
AWS Lambda for serverless processing.
Kinesis Data Firehose for storage/analysis.
Kinesis Data Analytics for real-time insights.
What are the Data Retention Limits for KDS?
Configurable from 1 day to 365 days.
Allows reprocessing or replaying data.
Data is immutable (cannot be modified or deleted).
What are some Use Cases for Kinesis Data Streams?
Real-time log or event processing.
Clickstream analysis for websites.
IoT data ingestion and analysis.
Financial transaction processing.
What are the capacity modes in Amazon Kinesis Data Streams?
Provisioned Mode: You manually define the number of shards to handle a predictable workload.
On-Demand Mode: Automatically scales to handle unpredictable workloads without requiring shard management.
What is the main difference between Kinesis Data Streams and Kinesis Data Firehose in terms of data processing?
Kinesis Data Streams: Requires custom code for both producers and consumers to process and analyze the data in real-time. It is used for real-time streaming data with the ability to replay data within the retention period.
Kinesis Data Firehose: A fully managed service that automatically handles data ingestion and delivery to destinations such as Amazon S3, Redshift, OpenSearch, or third-party services. It is near real-time and doesn’t require custom processing code for data delivery.
How do Kinesis Data Streams and Kinesis Data Firehose differ in terms of scaling?
Kinesis Data Streams: Requires manual scaling, including shard management, and splitting/merging shards to meet capacity requirements.
Kinesis Data Firehose: Provides automated scaling, adjusting to the throughput demand without the need for manual intervention.