Decoupling Applications - Kinesis/ Data Streams/ Firehose Flashcards

1
Q

What is Amazon Kinesis Data Streams (KDS)?

A

A fully managed service for real-time streaming of big data, allowing producers to send and consumers to process data at scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are Shards in Kinesis Data Streams?

A

Units of capacity in a Kinesis Data Stream that determine:

Ingestion: 1 MB/s or 1,000 records/s per shard.
Consumption: 2 MB/s per shard (shared across consumers).

Streams are made up of multiple shards, and the number of shards must be provisioned in Provisioned Mode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are Producers in KDS?

A

Entities that send data to the stream. Examples include:

Applications (via AWS SDK or Kinesis Producer Library - KPL).
Kinesis Agent for log streaming.
IoT devices or servers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are Consumers in KDS?

A

Applications or services that read data from the stream. Examples include:

Custom Consumers: Using Kinesis Client Library (KCL) or SDK.

AWS Managed Services:
AWS Lambda for serverless processing.
Kinesis Data Firehose for storage/analysis.
Kinesis Data Analytics for real-time insights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the Data Retention Limits for KDS?

A

Configurable from 1 day to 365 days.
Allows reprocessing or replaying data.
Data is immutable (cannot be modified or deleted).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some Use Cases for Kinesis Data Streams?

A

Real-time log or event processing.
Clickstream analysis for websites.
IoT data ingestion and analysis.
Financial transaction processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the capacity modes in Amazon Kinesis Data Streams?

A

Provisioned Mode: You manually define the number of shards to handle a predictable workload.

On-Demand Mode: Automatically scales to handle unpredictable workloads without requiring shard management.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the main difference between Kinesis Data Streams and Kinesis Data Firehose in terms of data processing?

A

Kinesis Data Streams: Requires custom code for both producers and consumers to process and analyze the data in real-time. It is used for real-time streaming data with the ability to replay data within the retention period.

Kinesis Data Firehose: A fully managed service that automatically handles data ingestion and delivery to destinations such as Amazon S3, Redshift, OpenSearch, or third-party services. It is near real-time and doesn’t require custom processing code for data delivery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do Kinesis Data Streams and Kinesis Data Firehose differ in terms of scaling?

A

Kinesis Data Streams: Requires manual scaling, including shard management, and splitting/merging shards to meet capacity requirements.
Kinesis Data Firehose: Provides automated scaling, adjusting to the throughput demand without the need for manual intervention.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly