AWS Kinesis Data Firehose Flashcards

Question 1

Q

What is Kinesis Data Firehose

Answer

A

Fully Managed Service, no administration, automatic scaling, serverless. It takes data from a producer, can transform it using Lambda, and then writes it to a consumer

Below are the consumers
* AWS: Redshift / Amazon S3 / OpenSearch
* 3rd party partner: Splunk / MongoDB / DataDog / NewRelic
* Custom: send to any HTTP endpoint

Pay for data going through Firehose

Near Real Time - (Exam key hint)
* 60 seconds latency minimum for non full batches
* Or minimum 1MB of data at a time

Supports many data formats, conversions, transformations, compression
Supports custom data transformations using AWS Lambda
Can send failed or all data to a backup S3 bucket

It automatically scales, doesn’t support replay capability like Kinesis streams, and there are no data storage

Question 2

Q

Explain Firehose Buffer Sizing

Answer

A

Firehose accumulates records in a buffer
The buffer is flushed based on time and size rules:
* Buffer Size (ex: 32MB): if that buffer size is reached, it’s flushed
OR
* Buffer Time (ex: 1 minute): if that time is reached, it’s flushed

Firehose can automatically increase the buffer size to increase throughput
High throughput => Buffer Size will be hit
Low throughput => Buffer Time will be hit
If real-time flush from Kinesis Data Streams to S3 is needed, use Lambda