Kinesis Consumers Flashcards
What are the 7 ways to Consume data from a Kinesis Stream?
- Kinesis SDK
- Kinesis Client Library (KCL)
- Kinesis Connector Library
- 3rd party libraries
- Kinesis Firehose
- AWS Lambda
- Kinesis Consumer Enhanced Fan Out
What are some examples of 3rd party libraries to consume data from a stream?
- Spark
- Log4J
- Appenders
- Flume
- Kafka
- Connect
What is the data read shard limit?
2MB
How much data does the SDK GetRecords return?
Up to 10MB, which exceeds to 2MB/s limit, so you need to wait another 5 seconds for the next call
How many GetRecords API calls can a shard make per second?
5 GetRecords API calls
What is the GetRecords API latency?
200ms
Can you increase throughput by adding more consumers to read from the same shard?
No, if more consumers read from the same shard, they share the 2MB/s limit and the 5 API calls. These limits are per shard per second
How would you get around the issue of multiple consumers sharing the read limits of shards?
Use Fan out
What are 6 features of the Kinesis Client Library?
- Exists for multiple languages such as Java, Node, Python
- Read records from the stream that were produced with the KPL (decode)
- Share multiple shards with multiple consumers in one group, and shard discovery
- Checkpointing system to resume progress
- Uses DynamoDB for Checkpointing
- Record Processors will process the data
What should you do if your KCL is not reading fast enough even if your stream has enough throughput?
The DynamoDB may not have enough WCU/RCU to efficiently Checkpoint
What are 4 features of using Lambda to read from a stream?
- Lambda has a library to de-aggregate records from KPL
- Lambda can be used to run lightweight ETL
- Lambda can be used to trigger
- Lambda has a configurable batch size
What is the data limit between Kinesis Consumers and Kinesis Fan Out
Kinesis Fan Out pushes data to consumers to it gets 2MB per second per consumer vs 2MB per second per shard
What are 3 reasons you would choose Classic Consumers?
- Low number of consuming applications, less than 5
- You can tolerate latency of about 200ms
- Minimize cost
What are 4 reasons you would choose Enhanced Fan Out Consumers?
- Multiple consumers for the same stream
- Low latency of about 70ms
- Higher costs
- Default limit of 5 consumers using enhanced fan out per data stream
How would you divide a hot shard?
Use shard splitting
What is shard splitting?
It is a process of splitting an existing shard into 2 shards. The data in the existing shard remains until it is expired, then the 2 new shards take over
What is merging shards?
It is to take shards away
What happens when we merge shards?
Old shards are closed and deleted when data expires
Is auto scaling possible in Kinesis?
Its possible but not easy. You can use the “UpdateShardCount” API but there are manual steps
What are 3 main limitations of Kinesis scaling?
- Resharding cannot be done in parallel, so plan capacity in advance
- You can only perform 1 resharding operation at a time and it takes a few seconds
- For 1000 shards, it takes 30k seconds to double to 2000 shards, thats 8.3 hours
What are 5 options for Kinesis Security?
- Control access using IAM policies
- Encryption in flight using HTTPS endpoints
- Encryption at rest using KMS
- Manually implemented Client Side encryption
- VPC endpoints to access within a VPC