Kinesis Flashcards
What is Kinesis?
It is a platform on AWS to send your streaming data to. Kinesis makes it easy to load and analyse streaming data, and also providing the ability for you to build your own custom applications for your business needs.
Which are the three core Kinesis Services?
- Kinesis Streams
- Kinesis Firehose
- kinesis Analytics
What is the retention period for Kinesis Streams? What is the default?
- 24 hours - 7 days
* Default is 24 hours
What is the retention period for Kinesis Firehose?
There are none. As soon as the data comes into Firehose it is either analysed using Lambda or directly sent to S3 (or other locations).
In short, how does Kinesis Streams work?
Kinesis Streams consists of Shards. Consumers consumes the data from the Shards and analyse/process it. The consumers can then forward the data to things like DynamoDB, S3, EMR (Elastic Map Reduce) or Redshift.
In short, how does Kinesis Firehose differ from Kinesis Streams?
Kinesis Firehose is automated: you don’t have to worry about Shards, data retention or data consumers. Firehose automatically sends the data to S3. You can optionally analyse the data using Lambda.
What is Kinesis Analytics?
Basically, it’s a way of analysing streaming data within Kinesis using SQL type languages.
Kinesis Analytics allows you to run SQL queries on the data that exists within Firehose or Streams. You can then use that SQL query to store that data inside S3, Redshift or Elasticsearch Cluster.
What is resharding?
It is when we increase the number of shards.
How do you make sure to use a reasonable number of consumers (EC2 instances) for the number of shards you have?
You should use an Auto Scaling group and base scaling decisions on the CPU load on your consumers.
What is Kinesis Client Library?
- It runs on your consumers and creates a record processor for each shard that is being consumed by your consumers
- If you increase the number of shards, the KCL will add more record processors on your consumers
- KCL will make sure to balance the number of record processors between the consumers (equal)?
Can a single EC2 handle more than one shard?
Yes.
What is the capacity of a single shard?
- 5 read transactions/s, up to a max of 2MB/s
- 1.000 writes/s, up to a max of 1MG/s