Amazon Kinesis Data Firehose | Troubleshooting and Managing Amazon Kinesis Data Firehose Flashcards
Can a single delivery stream deliver data to multiple Amazon Elasticsearch Service domains or indexes?
Troubleshooting and Managing Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose | Analytics
A single delivery stream can only deliver data to one Amazon Elasticsearch Service domain and one index currently. If you want to have data delivered to multiple Amazon Elasticsearch domains or indexes, you can create multiple delivery streams.
Why do I get throttled when sending data to my Amazon Kinesis Data Firehose delivery stream?
Troubleshooting and Managing Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose | Analytics
By default, each delivery stream can intake up to 2,000 transactions/second, 5,000 records/second, and 5 MB/second. You can have this limit increased easily by submitting a service limit increase form.
Why do I see duplicated records in my Amazon S3 bucket, Amazon Redshift table, or Amazon Elasticsearch index?
Troubleshooting and Managing Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose | Analytics
Amazon Kinesis Data Firehose uses at least once semantics for data delivery. In rare circumstances such as request timeout upon data delivery attempt, delivery retry by Firehose could introduce duplicates if the previous request eventually goes through.
What happens if data delivery to my Amazon S3 bucket fails?
Troubleshooting and Managing Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose | Analytics
If data delivery to your Amazon S3 bucket fails, Amazon Kinesis Data Firehose retries to deliver data every 5 seconds for up to a maximum period of 24 hours. If the issue continues beyond the 24-hour maximum retention period, it discards the data.
What happens if data delivery to my Amazon Redshift cluster fails?
Troubleshooting and Managing Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose | Analytics
If data delivery to your Amazon Redshift cluster fails, Amazon Kinesis Data Firehose retries data delivery every 5 minutes for up to a maximum period of 60 minutes. After 60 minutes, Amazon Kinesis Data Firehose skips the current batch of S3 objects that are ready for COPY and moves on to the next batch. The information about the skipped objects is delivered to your S3 bucket as a manifest file in the errors folder, which you can use for manual backfill. For information about how to COPY data manually with manifest files, see Using a Manifest to Specify Data Files.
What happens if data delivery to my Amazon Elasticsearch domain fails?
Troubleshooting and Managing Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose | Analytics
For Amazon Elasticsearch Service destination, you can specify a retry duration between 0 and 7200 seconds when creating the delivery stream. If data delivery to your Amazon ES domain fails, Amazon Kinesis Data Firehose retries data delivery for the specified time duration. After the retrial period, Amazon Kinesis Data Firehose skips the current batch of data and moves on to the next batch. Details on skipped documents are delivered to your S3 bucket in the elasticsearch_failed folder, which you can use for manual backfill.
What happens if there is a data transformation failure?
Troubleshooting and Managing Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose | Analytics
There are two types of failure scenarios when Firehose attempts to invoke your Lambda function for data transformation:
The first type is when the function invocation fails for reasons such as reaching network timeout, and hitting Lambda invocation limits. Under these failure scenarios, Firehose retries the invocation for three times by default and then skips that particular batch of records. The skipped records are treated as unsuccessfully processed records. You can configure the number of invocation re-trials between 0 and 300 using the CreateDeliveryStream and UpdateDeliveryStream APIs. For this type of failure, you can also use Firehose’s error logging feature to emit invocation errors to CloudWatch Logs. For more information, see Monitoring with Amazon CloudWatch Logs.
The second type of failure scenario occurs when a record’s transformation result is set to “ProcessingFailed” when it is returned from your Lambda function. Firehose treats these records as unsuccessfully processed records. For this type of failure, you can use Lambda’s logging feature to emit error logs to CloudWatch Logs. For more information, see Accessing Amazon CloudWatch Logs for AWS Lambda.
For both types of failure scenarios, the unsuccessfully processed records are delivered to your S3 bucket in the processing_failed folder.
Why is the size of delivered S3 objects larger than the buffer size I specified in my delivery stream configuration?
Troubleshooting and Managing Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose | Analytics
The size of delivered S3 objects should reflect the specified buffer size most of the time if buffer size condition is satisfied before buffer interval condition. However, when data delivery to destination is falling behind data writing to delivery stream, Firehose raises buffer size dynamically to catch up and make sure that all data is delivered to the destination. In these circumstances, the size of delivered S3 objects might be larger than the specified buffer size.
What is the errors folder in my Amazon S3 bucket?
Troubleshooting and Managing Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose | Analytics
The errors folder stores manifest files that contain information of S3 objects that failed to load to your Amazon Redshift cluster. You can reload these objects manually through Redshift COPY command. For information about how to COPY data manually with manifest files, see Using a Manifest to Specify Data Files.
What is the elasticsearch_failed folder in my Amazon S3 bucket?
Troubleshooting and Managing Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose | Analytics
The elasticsearch_failed folder stores the documents that failed to load to your Amazon Elasticsearch domain. You can re-index these documents manually for backfill.
What is the processing_failed folder in my Amazon S3 bucket?
Troubleshooting and Managing Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose | Analytics
The processing_failed folder stores the records that failed to transform in your AWS Lambda function. You can re-process these records manually.
How do I monitor the operations and performance of my Amazon Kinesis Data Firehose delivery stream?
Troubleshooting and Managing Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose | Analytics
Firehose Console displays key operational and performance metrics such as incoming data volume and delivered data volume. Amazon Kinesis Data Firehose also integrates with Amazon CloudWatch Metrics so that you can collect, view, and analyze metrics for your delivery streams. For more information about Amazon Kinesis Data Firehose metrics, see Monitoring with Amazon CloudWatch Metrics in the Amazon Kinesis Data Firehose developer guide.
How do I monitor data transformation and delivery failures of my Amazon Kinesis Data Firehose delivery stream?
Troubleshooting and Managing Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose | Analytics
Amazon Kinesis Data Firehose integrates with Amazon CloudWatch Logs so that you can view the specific error logs if data transformation or delivery fails. You can enable error logging when creating your delivery stream. For more information, see Monitoring with Amazon CloudWatch Logs in the Amazon Kinesis Data Firehose developer guide.
How do I manage and control access to my Amazon Kinesis Data Firehose delivery stream?
Troubleshooting and Managing Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose | Analytics
Amazon Kinesis Data Firehose integrates with AWS Identity and Access Management, a service that enables you to securely control access to your AWS services and resources for your users. For example, you can create a policy that only allows a specific user or group to add data to your Firehose delivery stream. For more information about access management and control of your stream, see Controlling Access with Amazon Kinesis Data Firehose.