Kinesis Consumers Flashcards
What are the 7 ways to Consume data from a Kinesis Stream?
- Kinesis SDK
- Kinesis Client Library (KCL)
- Kinesis Connector Library
- 3rd party libraries
- Kinesis Firehose
- AWS Lambda
- Kinesis Consumer Enhanced Fan Out
What are some examples of 3rd party libraries to consume data from a stream?
- Spark
- Log4J
- Appenders
- Flume
- Kafka
- Connect
What is the data read shard limit?
2MB
How much data does the SDK GetRecords return?
Up to 10MB, which exceeds to 2MB/s limit, so you need to wait another 5 seconds for the next call
How many GetRecords API calls can a shard make per second?
5 GetRecords API calls
What is the GetRecords API latency?
200ms
Can you increase throughput by adding more consumers to read from the same shard?
No, if more consumers read from the same shard, they share the 2MB/s limit and the 5 API calls. These limits are per shard per second
How would you get around the issue of multiple consumers sharing the read limits of shards?
Use Fan out
What are 6 features of the Kinesis Client Library?
- Exists for multiple languages such as Java, Node, Python
- Read records from the stream that were produced with the KPL (decode)
- Share multiple shards with multiple consumers in one group, and shard discovery
- Checkpointing system to resume progress
- Uses DynamoDB for Checkpointing
- Record Processors will process the data
What should you do if your KCL is not reading fast enough even if your stream has enough throughput?
The DynamoDB may not have enough WCU/RCU to efficiently Checkpoint
What are 4 features of using Lambda to read from a stream?
- Lambda has a library to de-aggregate records from KPL
- Lambda can be used to run lightweight ETL
- Lambda can be used to trigger
- Lambda has a configurable batch size
What is the data limit between Kinesis Consumers and Kinesis Fan Out
Kinesis Fan Out pushes data to consumers to it gets 2MB per second per consumer vs 2MB per second per shard
What are 3 reasons you would choose Classic Consumers?
- Low number of consuming applications, less than 5
- You can tolerate latency of about 200ms
- Minimize cost
What are 4 reasons you would choose Enhanced Fan Out Consumers?
- Multiple consumers for the same stream
- Low latency of about 70ms
- Higher costs
- Default limit of 5 consumers using enhanced fan out per data stream
How would you divide a hot shard?
Use shard splitting