AWS Kinesis Data Firehose Flashcards
What is Kinesis Data Firehose
Fully Managed Service, no administration, automatic scaling, serverless. It takes data from a producer, can transform it using Lambda, and then writes it to a consumer
Below are the consumers
* AWS: Redshift / Amazon S3 / OpenSearch
* 3rd party partner: Splunk / MongoDB / DataDog / NewRelic
* Custom: send to any HTTP endpoint
- Pay for data going through Firehose
Near Real Time - (Exam key hint)
* 60 seconds latency minimum for non full batches
* Or minimum 1MB of data at a time
- Supports many data formats, conversions, transformations, compression
- Supports custom data transformations using AWS Lambda
- Can send failed or all data to a backup S3 bucket
It automatically scales, doesn’t support replay capability like Kinesis streams, and there are no data storage
Explain Firehose Buffer Sizing
Firehose accumulates records in a buffer
The buffer is flushed based on time and size rules:
* Buffer Size (ex: 32MB): if that buffer size is reached, it’s flushed
OR
* Buffer Time (ex: 1 minute): if that time is reached, it’s flushed
- Firehose can automatically increase the buffer size to increase throughput
- High throughput => Buffer Size will be hit
- Low throughput => Buffer Time will be hit
- If real-time flush from Kinesis Data Streams to S3 is needed, use Lambda