Amazon Kinesis Data Firehose | Troubleshooting and Managing Amazon Kinesis Data Firehose Flashcards

1
Q

Can a single delivery stream deliver data to multiple Amazon Elasticsearch Service domains or indexes?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

A single delivery stream can only deliver data to one Amazon Elasticsearch Service domain and one index currently. If you want to have data delivered to multiple Amazon Elasticsearch domains or indexes, you can create multiple delivery streams.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do I get throttled when sending data to my Amazon Kinesis Data Firehose delivery stream?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

By default, each delivery stream can intake up to 2,000 transactions/second, 5,000 records/second, and 5 MB/second. You can have this limit increased easily by submitting a service limit increase form.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why do I see duplicated records in my Amazon S3 bucket, Amazon Redshift table, or Amazon Elasticsearch index?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Data Firehose uses at least once semantics for data delivery. In rare circumstances such as request timeout upon data delivery attempt, delivery retry by Firehose could introduce duplicates if the previous request eventually goes through.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What happens if data delivery to my Amazon S3 bucket fails?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

If data delivery to your Amazon S3 bucket fails, Amazon Kinesis Data Firehose retries to deliver data every 5 seconds for up to a maximum period of 24 hours. If the issue continues beyond the 24-hour maximum retention period, it discards the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What happens if data delivery to my Amazon Redshift cluster fails?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

If data delivery to your Amazon Redshift cluster fails, Amazon Kinesis Data Firehose retries data delivery every 5 minutes for up to a maximum period of 60 minutes. After 60 minutes, Amazon Kinesis Data Firehose skips the current batch of S3 objects that are ready for COPY and moves on to the next batch. The information about the skipped objects is delivered to your S3 bucket as a manifest file in the errors folder, which you can use for manual backfill. For information about how to COPY data manually with manifest files, see Using a Manifest to Specify Data Files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What happens if data delivery to my Amazon Elasticsearch domain fails?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

For Amazon Elasticsearch Service destination, you can specify a retry duration between 0 and 7200 seconds when creating the delivery stream. If data delivery to your Amazon ES domain fails, Amazon Kinesis Data Firehose retries data delivery for the specified time duration. After the retrial period, Amazon Kinesis Data Firehose skips the current batch of data and moves on to the next batch. Details on skipped documents are delivered to your S3 bucket in the elasticsearch_failed folder, which you can use for manual backfill.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What happens if there is a data transformation failure?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

There are two types of failure scenarios when Firehose attempts to invoke your Lambda function for data transformation:

The first type is when the function invocation fails for reasons such as reaching network timeout, and hitting Lambda invocation limits. Under these failure scenarios, Firehose retries the invocation for three times by default and then skips that particular batch of records. The skipped records are treated as unsuccessfully processed records. You can configure the number of invocation re-trials between 0 and 300 using the CreateDeliveryStream and UpdateDeliveryStream APIs. For this type of failure, you can also use Firehose’s error logging feature to emit invocation errors to CloudWatch Logs. For more information, see Monitoring with Amazon CloudWatch Logs.

The second type of failure scenario occurs when a record’s transformation result is set to “ProcessingFailed” when it is returned from your Lambda function. Firehose treats these records as unsuccessfully processed records. For this type of failure, you can use Lambda’s logging feature to emit error logs to CloudWatch Logs. For more information, see Accessing Amazon CloudWatch Logs for AWS Lambda.

For both types of failure scenarios, the unsuccessfully processed records are delivered to your S3 bucket in the processing_failed folder.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is the size of delivered S3 objects larger than the buffer size I specified in my delivery stream configuration?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

The size of delivered S3 objects should reflect the specified buffer size most of the time if buffer size condition is satisfied before buffer interval condition. However, when data delivery to destination is falling behind data writing to delivery stream, Firehose raises buffer size dynamically to catch up and make sure that all data is delivered to the destination. In these circumstances, the size of delivered S3 objects might be larger than the specified buffer size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the errors folder in my Amazon S3 bucket?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

The errors folder stores manifest files that contain information of S3 objects that failed to load to your Amazon Redshift cluster. You can reload these objects manually through Redshift COPY command. For information about how to COPY data manually with manifest files, see Using a Manifest to Specify Data Files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the elasticsearch_failed folder in my Amazon S3 bucket?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

The elasticsearch_failed folder stores the documents that failed to load to your Amazon Elasticsearch domain. You can re-index these documents manually for backfill.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the processing_failed folder in my Amazon S3 bucket?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

The processing_failed folder stores the records that failed to transform in your AWS Lambda function. You can re-process these records manually.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do I monitor the operations and performance of my Amazon Kinesis Data Firehose delivery stream?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

Firehose Console displays key operational and performance metrics such as incoming data volume and delivered data volume. Amazon Kinesis Data Firehose also integrates with Amazon CloudWatch Metrics so that you can collect, view, and analyze metrics for your delivery streams. For more information about Amazon Kinesis Data Firehose metrics, see Monitoring with Amazon CloudWatch Metrics in the Amazon Kinesis Data Firehose developer guide.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do I monitor data transformation and delivery failures of my Amazon Kinesis Data Firehose delivery stream?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Data Firehose integrates with Amazon CloudWatch Logs so that you can view the specific error logs if data transformation or delivery fails. You can enable error logging when creating your delivery stream. For more information, see Monitoring with Amazon CloudWatch Logs in the Amazon Kinesis Data Firehose developer guide.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do I manage and control access to my Amazon Kinesis Data Firehose delivery stream?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Data Firehose integrates with AWS Identity and Access Management, a service that enables you to securely control access to your AWS services and resources for your users. For example, you can create a policy that only allows a specific user or group to add data to your Firehose delivery stream. For more information about access management and control of your stream, see Controlling Access with Amazon Kinesis Data Firehose.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly