Analytics | Amazon Kinesis Data Firehose Flashcards

1
Q

What is Amazon Kinesis Data Firehose?

General

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Data Firehose is the easiest way to load streaming data into data stores and analytics tools. It can capture, transform, and load streaming data into Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk, enabling near real-time analytics with existing business intelligence tools and dashboards you’re already using today. It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. It can also batch, compress, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does Amazon Kinesis Data Firehose manage on my behalf?

General

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Data Firehose manages all underlying infrastructure, storage, networking, and configuration needed to capture and load your data into Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, or Splunk. You do not have to worry about provisioning, deployment, ongoing maintenance of the hardware, software, or write any other application to manage this process. Firehose also scales elastically without requiring any intervention or associated developer overhead. Moreover, Amazon Kinesis Data Firehose synchronously replicates data across three facilities in an AWS Region, providing high availability and durability for the data as it is transported to the destinations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do I use Amazon Kinesis Data Firehose?

General

Amazon Kinesis Data Firehose | Analytics

A

After you sign up for Amazon Web Services, you can start using Amazon Kinesis Data Firehose with the following steps:

Create an Amazon Kinesis Data Firehose delivery stream through the Firehose Console or the CreateDeliveryStream operation. You can optionally configure an AWS Lambda function in your delivery stream to prepare and transform the raw data before loading the data.

Configure your data producers to continuously send data to your delivery stream using the Amazon Kinesis Agent or the Firehose API.

Firehose automatically and continuously loads your data to the destinations you specify.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a source?

General

Amazon Kinesis Data Firehose | Analytics

A

A source is where your streaming data is continuously generated and captured. For example, a source can be a logging server running on Amazon EC2 instances, an application running on mobile devices, a sensor on an IoT device, or a Kinesis stream.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the limits of Amazon Kinesis Data Firehose?

Key Amazon Kinesis Data Firehose Concepts

Amazon Kinesis Data Firehose | Analytics

A

For information about limits, see Amazon Kinesis Data Firehose Limits in the developer guide.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a delivery stream?

Key Amazon Kinesis Data Firehose Concepts

Amazon Kinesis Data Firehose | Analytics

A

A delivery stream is the underlying entity of Amazon Kinesis Data Firehose. You use Firehose by creating a delivery stream and then sending data to it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a record?

Key Amazon Kinesis Data Firehose Concepts

Amazon Kinesis Data Firehose | Analytics

A

A record is the data of interest your data producer sends to a delivery stream. The maximum size of a record (before Base64-encoding) is 1000 KB.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a destination?

Creating Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

A destination is the data store where your data will be delivered. Amazon Kinesis Data Firehose currently supports Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk as destinations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do I create a delivery stream?

Creating Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

You can create an Amazon Kinesis Data Firehose delivery stream through the Firehose Console or the CreateDeliveryStream operation. For more information, see Creating a Delivery Stream.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What compression format can I use?

Creating Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Data Firehose allows you to compress your data before delivering it to Amazon S3. The service currently supports GZIP, ZIP, and SNAPPY compression formats. Only GZIP is supported if the data is further loaded to Amazon Redshift.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does compression work when I use the CloudWatch Logs subscription feature?

Creating Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

You can use CloudWatch Logs subscription feature to stream data from CloudWatch Logs to Kinesis Data Firehose. All log events from CloudWatch Logs are already compressed in gzip format, so you should keep Firehose’s compression configuration as uncompressed to avoid double-compression. For more information about CloudWatch Logs subscription feature, see Subscription Filters with Amazon Kinesis Data Firehose in the Amazon CloudWatch Logs user guide.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What kind of encryption can I use?

Creating Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Data Firehose allows you to encrypt your data after it’s delivered to your Amazon S3 bucket. While creating your delivery stream, you can choose to encrypt your data with an AWS Key Management Service (KMS) key that you own. For more information about KMS, see AWS Key Management Service.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is data transformation with Lambda?

Creating Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

Firehose can invoke an AWS Lambda function to transform incoming data before delivering it to destinations. You can configure a new Lambda function using one of the Lambda blueprints we provide or choose an existing Lambda function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is source record backup?

Creating Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

If you use data transformation with Lambda, you can enable source record backup, and Amazon Kinesis Data Firehose will deliver the un-transformed incoming data to a separate S3 bucket. You can specify an extra prefix to be added in front of the “YYYY/MM/DD/HH” UTC time prefix generated by Firehose.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is error logging?

Creating Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

If you enable data transformation with Lambda, Firehose can log any Lambda invocation and data delivery errors to Amazon CloudWatch Logs so that you can view the specific error logs if Lambda invocation or data delivery fails. For more information, see Monitoring with Amazon CloudWatch Logs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is buffer size and buffer interval?

Creating Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Data Firehose buffers incoming streaming data to a certain size or for a certain period of time before delivering it to destinations. You can configure buffer size and buffer interval while creating your delivery stream. Buffer size is in MBs and ranges from 1MB to 128MB for Amazon S3 destination and 1MB to 100MB for Amazon Elasticsearch Service destination. Buffer interval is in seconds and ranges from 60 seconds to 900 seconds. Please note that in circumstances where data delivery to destination is falling behind data writing to delivery stream, Firehose raises buffer size dynamically to catch up and make sure that all data is delivered to the destination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How is buffer size applied if I choose to compress my data?

Creating Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

Buffer size is applied before compression. As a result, if you choose to compress your data, the size of the objects within your Amazon S3 bucket can be smaller than the buffer size you specify.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the IAM role that I need to specify while creating a delivery stream?

Creating Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Data Firehose assumes the IAM role you specify to access resources such as your Amazon S3 bucket and Amazon Elasticsearch domain. For more information, see Controlling Access with Amazon Kinesis Data Firehose in the Amazon Kinesis Data Firehose developer guide.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What privilege is required for the Amazon Redshift user that I need to specify while creating a delivery stream?

Creating Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

The Amazon Redshift user needs to have Redshift INSERT privilege for copying data from your Amazon S3 bucket to your Redshift cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What do I need to do if my Amazon Redshift cluster is within a VPC?

Creating Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

If your Amazon Redshift cluster is within a VPC, you need to grant Amazon Kinesis Data Firehose access to your Redshift cluster by unblocking Firehose IP addresses from your VPC. Firehose currently uses one CIDR block for each available AWS Region: 52.70.63.192/27 for US East (N. Virginia), 52.89.255.224/27 for US West (Oregon), and 52.19.239.192/27 for EU (Ireland). For information about how to unblock IPs to your VPC, see Grant Firehose Access to an Amazon Redshift Destination in the Amazon Kinesis Data Firehose developer guide.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Why do I need to provide an Amazon S3 bucket while choosing Amazon Redshift as destination?

Creating Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

For Amazon Redshift destination, Amazon Kinesis Data Firehose delivers data to your Amazon S3 bucket first and then issues Redshift COPY command to load data from your S3 bucket to your Redshift cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is index rotation for Amazon Elasticsearch Service destination?

Creating Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Data Firehose can rotate your Amazon Elasticsearch Service index based on a time duration. You can configure this time duration while creating your delivery stream. For more information, see Index Rotation for the Amazon ES Destination in the Amazon Kinesis Data Firehose developer guide.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why do I need to provide an Amazon S3 bucket when choosing Amazon Elasticsearch Service as destination?

Creating Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

When loading data into Amazon Elasticsearch Service, Amazon Kinesis Data Firehose can back up all of the data or only the data that failed to deliver. To take advantage of this feature and prevent any data loss, you need to provide a backup Amazon S3 bucket.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Can I change the configurations of my delivery stream after it’s created?

Preparing and Transforming Data in Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

You can change the configuration of your delivery stream at any time after it’s created. You can do so by using the Firehose Console or the UpdateDestination operation. Your delivery stream remains in ACTIVE state while your configurations are updated and you can continue to send data to your delivery stream. The updated configurations normally take effect within a few minutes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How do I prepare and transform raw data in Amazon Kinesis Data Firehose?

Preparing and Transforming Data in Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Data Firehose allows you to use an AWS Lambda function to prepare and transform incoming raw data in your delivery stream before loading it to destinations. You can configure an AWS Lambda function for data transformation when you create a new delivery stream or when you edit an existing delivery stream.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How do I return prepared and transformed data from my AWS Lambda function back to Amazon Kinesis Data Firehose?

Preparing and Transforming Data in Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

All transformed records from Lambda must be returned to Firehose with the following three parameters; otherwise, Firehose will reject the records and treat them as data transformation failure.

recordId: Firehose passes a recordId along with each record to Lambda during the invocation. Each transformed record should be returned with the exact same recordId. Any mismatch between the original recordId and returned recordId will be treated as data transformation failure.

result: The status of transformation result of each record. The following values are allowed for this parameter: “Ok” if the record is transformed successfully as expected. “Dropped” if your processing logic intentionally drops the record as expected. “ProcessingFailed” if the record is not able to be transformed as expected. Firehose treats returned records with “Ok” and “Dropped” statuses as successfully processed records, and the ones with “ProcessingFailed” status as unsuccessfully processed records when it generates SucceedProcessing.Records and SucceedProcessing.Bytes metrics.
data: The transformed data payload after based64 encoding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What Lambda blueprints are available for data preparation and transformation?

Preparing and Transforming Data in Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

Firehose provides the following Lambda blueprints that you can use to create your Lambda function for data transformation:

General Firehose Processing: This blueprint contains the data transformation and status model described above. Use this blueprint for any custom transformation logic.

Apache Log to JSON: This blueprint parses and converts Apache log lines into JSON objects, with predefined JSON field names.

Apache Log to CSV: This blueprint parses and converts Apache log lines into CSV format.

Syslog to JSON: This blueprint parses and converts Syslog lines into JSON objects, with predefined JSON field names.

Syslog to CSV: This blueprint parses and converts Syslog lines into CSV format.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Can I keep a copy of all the raw data in my S3 bucket?

Adding Data to Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

Yes, Firehose can back up all un-transformed records to your S3 bucket concurrently while delivering transformed records to destination. Source record backup can be enabled when you create or update your delivery stream.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How do I add data to my Amazon Kinesis Data Firehose delivery stream?

Adding Data to Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

You can add data to an Amazon Kinesis Data Firehose delivery stream through Amazon Kinesis Agent or Firehose’s PutRecord and PutRecordBatch operations. Kinesis Data Firehose is also integrated with other AWS data sources such as Kinesis Data Streams, AWS IoT, Amazon CloudWatch Logs, and Amazon CloudWatch Events.

30
Q

What is Amazon Kinesis Agent?

Adding Data to Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Agent is a pre-built Java application that offers an easy way to collect and send data to your delivery stream. You can install the agent on Linux-based server environments such as web servers, log servers, and database servers. The agent monitors certain files and continuously sends data to your delivery stream. For more information, see Writing with Agents.

31
Q

What platforms does Amazon Kinesis Agent support?

Adding Data to Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Agent currently supports Amazon Linux and Red Hat Enterprise Linux.

32
Q

Where do I get Amazon Kinesis Agent?

Adding Data to Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

You can download and install Amazon Kinesis Agent using the following command and link:

On Amazon Linux: sudo yum install –y aws-kinesis-agent

On Red Hat Enterprise Linux: sudo yum install –y https://s3.amazonaws.com/streaming-data-agent/aws-kinesis-agent-latest.amzn1.noarch.rpm

From GitHub: awlabs/amazon-kinesis-agent

33
Q

How do I use Amazon Kinesis Agent?

Adding Data to Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

After installing Amazon Kinesis Agent on your servers, you can configure it to monitor certain files on the disk and then continuously send new data to your delivery stream. For more information, see Writing with Agents.

34
Q

What is the difference between PutRecord and PutRecordBatch operations?

Adding Data to Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

PutRecord operation allows a single data record within an API call and PutRecordBatch operation allows multiple data records within an API call. For more information, see PutRecord and PutRecordBatch.

35
Q

What programming languages or platforms can I use to access Amazon Kinesis Data Firehose API?

Adding Data to Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Data Firehose API is available in Amazon Web Services SDKs. For a list of programming languages or platforms for Amazon Web Services SDKs, see Tools for Amazon Web Services.

36
Q

How do I add data to my Firehose delivery stream from my Kinesis stream?

Adding Data to Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

When you create or update your delivery stream through AWS console or Firehose APIs, you can configure a Kinesis stream as the source of your delivery stream. Once configured, Firehose will automatically read data from your Kinesis stream and load the data to specified destinations.

37
Q

How often does Kinesis Data Firehose read data from my Kinesis stream?

Adding Data to Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

Kinesis Data Firehose calls Kinesis Data Streams GetRecords() once every second for each Kinesis shard.

38
Q

From where does Kinesis Data Firehose read data when my Kinesis stream is configured as the source of my delivery stream?

Adding Data to Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

Kinesis Data Firehose starts reading data from the LATEST position of your Kinesis data stream when it’s configured as the source of a delivery stream. For more information about Kinesis data stream position, see GetShardIterator in the Kinesis Data Streams Service API Reference.

39
Q

Can I configure my Kinesis data stream to be the source of multiple Firehose delivery streams?

Adding Data to Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

Yes, you can. However, note that the GetRecords() call from Kinesis Data Firehose is counted against the overall throttling limit of your Kinesis shard so that you need to plan your delivery stream along with your other Kinesis applications to make sure you won’t get throttled. For more information, see Kinesis Data Streams Limits in the Kinesis Data Streams developer guide.

40
Q

Can I still add data to delivery stream through Kinesis Agent or Firehose’s PutRecord and PutRecordBatch operations when my Kinesis data stream is configured as source?

Adding Data to Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

No, you cannot. When a Kinesis data stream is configured as the source of a Firehose delivery stream, Firehose’s PutRecord and PutRecordBatch operations will be disabled. You should add data to your Kinesis data stream through the Kinesis Data Streams PutRecord and PutRecords operations instead.

41
Q

How do I add data to my delivery stream from AWS IoT?

Adding Data to Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

You add data to your delivery stream from AWS IoT by creating an AWS IoT action that sends events to your delivery stream. For more information. See Writing to Amazon Kinesis Data Firehose Using AWS IoT in the Kinesis Data Firehose developer guide.

42
Q

How do I add data to my delivery stream from CloudWatch Logs?

Adding Data to Delivery Streams

Amazon Kinesis Data Firehose | Analytics

A

You add data to your Firehose delivery stream from CloudWatch Logs by creating a CloudWatch Logs subscription filter that sends events to your delivery stream. For more information, see Using CloudWatch Logs Subscription Filters in Amazon CloudWatch user guide.

43
Q

How do I add data to my Amazon Kinesis Data Firehose delivery stream from CloudWatch Events?

Data Delivery by Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

You add data to your Firehose delivery stream from CloudWatch Events by creating a CloudWatch Events rule with your delivery stream as target. For more information, see Writing to Amazon Kinesis Data Firehose Using CloudWatch Events in the Kinesis Data Firehose developer guide.

44
Q

How often does Amazon Kinesis Data Firehose deliver data to my Amazon S3 bucket?

Data Delivery by Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

The frequency of data delivery to Amazon S3 is determined by the S3 buffer size and buffer interval value you configured for your delivery stream. Amazon Kinesis Data Firehose buffers incoming data before delivering it to Amazon S3. You can configure the values for S3 buffer size (1 MB to 128 MB) or buffer interval (60 to 900 seconds), and the condition satisfied first triggers data delivery to Amazon S3. Note that in circumstances where data delivery to the destination is falling behind data ingestion into the delivery stream, Amazon Kinesis Data Firehose raises the buffer size automatically to catch up and make sure that all data is delivered to the destination.

45
Q

How often does Amazon Kinesis Data Firehose deliver data to my Amazon Redshift cluster?

Data Delivery by Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

For Amazon Redshift destination, Amazon Kinesis Data Firehose delivers data to your Amazon S3 bucket first and then issues Redshift COPY command to load data from your S3 bucket to your Redshift cluster. The frequency of data COPY operations from Amazon S3 to Amazon Redshift is determined by how fast your Redshift cluster can finish the COPY command. If there is still data to copy, Firehose issues a new COPY command as soon as the previous COPY command is successfully finished by your Redshift cluster.

46
Q

How often does Amazon Kinesis Data Firehose deliver data to my Amazon Elasticsearch domain?

Data Delivery by Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

The frequency of data delivery to Amazon Elasticsearch Service is determined by the Elasticsearch buffer size and buffer interval values that you configured for your delivery stream. Firehose buffers incoming data before delivering it to Amazon Elasticsearch Service. You can configure the values for Elasticsearch buffer size (1 MB to 100 MB) or buffer interval (60 to 900 seconds), and the condition satisfied first triggers data delivery to Amazon Elasticsearch Service. Note that in circumstances where data delivery to the destination is falling behind data ingestion into the delivery stream, Amazon Kinesis Data Firehose raises the buffer size automatically to catch up and make sure that all data is delivered to the destination.

47
Q

How is data organized in my Amazon S3 bucket?

Data Delivery by Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Data Firehose adds a UTC time prefix in the format YYYY/MM/DD/HH before putting objects to Amazon S3. The prefix translates into an Amazon S3 folder structure, where each label separated by a forward slash (/) becomes a sub-folder. You can modify this folder structure by adding your own top-level folder with a forward slash (for example, myApp/YYYY/MM/DD/HH) or prepending text to the YYYY top-level folder name (for example, myApp YYYY/MM/DD/HH). This is accomplished by specifying an S3 Prefix when creating your delivery stream.

48
Q

What is the naming pattern of the Amazon S3 objects delivered by Amazon Kinesis Data Firehose?

Data Delivery by Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

The Amazon S3 object name follows the pattern DeliveryStreamName-DeliveryStreamVersion-YYYY-MM-DD-HH-MM-SS-RandomString, where DeliveryStreamVersion begins with 1 and increases by 1 for every configuration change of the delivery stream. You can change delivery stream configurations (for example, the name of the S3 bucket, buffering hints, compression, and encryption) with the Firehose Console or the UpdateDestination operation.

49
Q

What is the manifests folder in my Amazon S3 bucket?

Data Delivery by Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

For Amazon Redshift destination, Amazon Kinesis Data Firehose generates manifest files to load Amazon S3 objects to Redshift cluster in batch. The manifests folder stores the manifest files generated by Firehose.

50
Q

How do backed up Elasticsearch documents look like in my Amazon S3 bucket?

Data Delivery by Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

If “all documents” mode is used, Amazon Kinesis Data Firehose concatenates multiple incoming records based on buffering configuration of your delivery stream, and then delivers them to your S3 bucket as an S3 object. Regardless of which backup mode is configured, the failed documents are delivered to your S3 bucket using a certain JSON format that provides additional information such as error code and time of delivery attempt. For more information, see Amazon S3 Backup for the Amazon ES Destination in the Amazon Kinesis Data Firehose developer guide.

51
Q

Can a single delivery stream deliver data to multiple Amazon S3 buckets?

Data Delivery by Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

A single delivery stream can only deliver data to one Amazon S3 bucket currently. If you want to have data delivered to multiple S3 buckets, you can create multiple delivery streams.

52
Q

Can a single delivery stream deliver data to multiple Amazon Redshift clusters or tables?

Data Delivery by Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

A single delivery stream can only deliver data to one Amazon Redshift cluster and one table currently. If you want to have data delivered to multiple Redshift clusters or tables, you can create multiple delivery streams.

53
Q

Can a single delivery stream deliver data to multiple Amazon Elasticsearch Service domains or indexes?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

A single delivery stream can only deliver data to one Amazon Elasticsearch Service domain and one index currently. If you want to have data delivered to multiple Amazon Elasticsearch domains or indexes, you can create multiple delivery streams.

54
Q

Why do I get throttled when sending data to my Amazon Kinesis Data Firehose delivery stream?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

By default, each delivery stream can intake up to 2,000 transactions/second, 5,000 records/second, and 5 MB/second. You can have this limit increased easily by submitting a service limit increase form.

55
Q

Why do I see duplicated records in my Amazon S3 bucket, Amazon Redshift table, or Amazon Elasticsearch index?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Data Firehose uses at least once semantics for data delivery. In rare circumstances such as request timeout upon data delivery attempt, delivery retry by Firehose could introduce duplicates if the previous request eventually goes through.

56
Q

What happens if data delivery to my Amazon S3 bucket fails?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

If data delivery to your Amazon S3 bucket fails, Amazon Kinesis Data Firehose retries to deliver data every 5 seconds for up to a maximum period of 24 hours. If the issue continues beyond the 24-hour maximum retention period, it discards the data.

57
Q

What happens if data delivery to my Amazon Redshift cluster fails?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

If data delivery to your Amazon Redshift cluster fails, Amazon Kinesis Data Firehose retries data delivery every 5 minutes for up to a maximum period of 60 minutes. After 60 minutes, Amazon Kinesis Data Firehose skips the current batch of S3 objects that are ready for COPY and moves on to the next batch. The information about the skipped objects is delivered to your S3 bucket as a manifest file in the errors folder, which you can use for manual backfill. For information about how to COPY data manually with manifest files, see Using a Manifest to Specify Data Files.

58
Q

What happens if data delivery to my Amazon Elasticsearch domain fails?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

For Amazon Elasticsearch Service destination, you can specify a retry duration between 0 and 7200 seconds when creating the delivery stream. If data delivery to your Amazon ES domain fails, Amazon Kinesis Data Firehose retries data delivery for the specified time duration. After the retrial period, Amazon Kinesis Data Firehose skips the current batch of data and moves on to the next batch. Details on skipped documents are delivered to your S3 bucket in the elasticsearch_failed folder, which you can use for manual backfill.

59
Q

What happens if there is a data transformation failure?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

There are two types of failure scenarios when Firehose attempts to invoke your Lambda function for data transformation:

The first type is when the function invocation fails for reasons such as reaching network timeout, and hitting Lambda invocation limits. Under these failure scenarios, Firehose retries the invocation for three times by default and then skips that particular batch of records. The skipped records are treated as unsuccessfully processed records. You can configure the number of invocation re-trials between 0 and 300 using the CreateDeliveryStream and UpdateDeliveryStream APIs. For this type of failure, you can also use Firehose’s error logging feature to emit invocation errors to CloudWatch Logs. For more information, see Monitoring with Amazon CloudWatch Logs.

The second type of failure scenario occurs when a record’s transformation result is set to “ProcessingFailed” when it is returned from your Lambda function. Firehose treats these records as unsuccessfully processed records. For this type of failure, you can use Lambda’s logging feature to emit error logs to CloudWatch Logs. For more information, see Accessing Amazon CloudWatch Logs for AWS Lambda.

For both types of failure scenarios, the unsuccessfully processed records are delivered to your S3 bucket in the processing_failed folder.

60
Q

Why is the size of delivered S3 objects larger than the buffer size I specified in my delivery stream configuration?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

The size of delivered S3 objects should reflect the specified buffer size most of the time if buffer size condition is satisfied before buffer interval condition. However, when data delivery to destination is falling behind data writing to delivery stream, Firehose raises buffer size dynamically to catch up and make sure that all data is delivered to the destination. In these circumstances, the size of delivered S3 objects might be larger than the specified buffer size.

61
Q

What is the errors folder in my Amazon S3 bucket?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

The errors folder stores manifest files that contain information of S3 objects that failed to load to your Amazon Redshift cluster. You can reload these objects manually through Redshift COPY command. For information about how to COPY data manually with manifest files, see Using a Manifest to Specify Data Files.

62
Q

What is the elasticsearch_failed folder in my Amazon S3 bucket?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

The elasticsearch_failed folder stores the documents that failed to load to your Amazon Elasticsearch domain. You can re-index these documents manually for backfill.

63
Q

What is the processing_failed folder in my Amazon S3 bucket?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

The processing_failed folder stores the records that failed to transform in your AWS Lambda function. You can re-process these records manually.

64
Q

How do I monitor the operations and performance of my Amazon Kinesis Data Firehose delivery stream?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

Firehose Console displays key operational and performance metrics such as incoming data volume and delivered data volume. Amazon Kinesis Data Firehose also integrates with Amazon CloudWatch Metrics so that you can collect, view, and analyze metrics for your delivery streams. For more information about Amazon Kinesis Data Firehose metrics, see Monitoring with Amazon CloudWatch Metrics in the Amazon Kinesis Data Firehose developer guide.

65
Q

How do I monitor data transformation and delivery failures of my Amazon Kinesis Data Firehose delivery stream?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Data Firehose integrates with Amazon CloudWatch Logs so that you can view the specific error logs if data transformation or delivery fails. You can enable error logging when creating your delivery stream. For more information, see Monitoring with Amazon CloudWatch Logs in the Amazon Kinesis Data Firehose developer guide.

66
Q

How do I manage and control access to my Amazon Kinesis Data Firehose delivery stream?

Troubleshooting and Managing Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Data Firehose integrates with AWS Identity and Access Management, a service that enables you to securely control access to your AWS services and resources for your users. For example, you can create a policy that only allows a specific user or group to add data to your Firehose delivery stream. For more information about access management and control of your stream, see Controlling Access with Amazon Kinesis Data Firehose.

67
Q

How do I log API calls made to my Amazon Kinesis Data Firehose delivery stream for security analysis and operational troubleshooting?

Pricing and Billing

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Data Firehose integrates with Amazon CloudTrail, a service that records AWS API calls for your account and delivers log files to you. For more information about API call logging and a list of supported Amazon Kinesis Data Firehose API operations, see Logging Amazon Kinesis Data Firehose API calls Using Amazon CloudTrail.

68
Q

Is Amazon Kinesis Data Firehose available in the AWS Free Tier?

Pricing and Billing

Amazon Kinesis Data Firehose | Analytics

A

No. Amazon Kinesis Data Firehose is not currently available in AWS Free Tier. AWS Free Tier is a program that offers free trial for a group of AWS services. For more details see AWS Free Tier.

69
Q

How much does Amazon Kinesis Data Firehose cost?

Pricing and Billing

Amazon Kinesis Data Firehose | Analytics

A

Amazon Kinesis Data Firehose uses simple pay as you go pricing. There is neither upfront cost nor minimum fees and you only pay for the resources you use. Amazon Kinesis Data Firehose pricing is based on the data volume (GB) ingested by Firehose, with each record rounded up to the nearest 5KB. For more information about Amazon Kinesis Data Firehose cost, see Amazon Kinesis Data Firehose Pricing.

70
Q

When I use PutRecordBatch operation to send data to Amazon Kinesis Data Firehose, how is the 5KB roundup calculated?

Pricing and Billing

Amazon Kinesis Data Firehose | Analytics

A

The 5KB roundup is calculated at the record level rather than the API operation level. For example, if your PutRecordBatch call contains two 1KB records, the data volume from that call is metered as 10KB. (5KB per record)