Kinesis Flashcards
What is Kinesis?
A managed alternative to Apache Kafka to Easily collect, process, and analyze video and data streams in real time
What is mainly great for Kinesis?
for “Real-time” big data, for streaming processing frameworks (Spark, NiFi, etc…)
How is data replicated in Kinesis?
automatically to 3 AZs
What are the Kinesis services?
Kinesis Streams
Kinesis Analytics
Kinesis Data Firehose
What is Kinesis Streams?
It is Kinesis itself, low latency streaming ingests at scale
What is Kinesis Analytics?
managed service to perform real-time analytics on streams using SQL
What is Kinesis Data Firehose?
fully managed service to load streams into S3, Redshift, ElasticSearch, Splunk
What are common streams consumed by Kinesis Streams?
ClickStreams
IoT devices
Metrics and logs
Are Kinesis Streams divided?
in ordered Shards / partitions
What is the data retention period by default in Kinesis Streams?
1 day, up to 7
What ability has Kinesis that SQS does not?
to reprocess / replay data
How many consumers can have a Kinesis Stream?
multiple
How does Kinesis scale out?
adding new shards, it does not auto scale
What can’t you do to data inserted in Kinesis?
delete it, it is inmutable
What is the writing speed of a Kinesis Stream Shard?
1 MB/s or 1000 messages at write PER SHARD
What is the reading speed of a Kinesis Stream Shard?
2 MB/s at read PER SHARD
How is billing in Kinesis Streams?
per shard provisioned
What can happens to the number of shards over time?
can evolve, reshard or merge
How are records ordered in Kinesis Streams?
per shard
What contains a record sent from a producer to Kinesis?
A message key and the data itself
What is useful for the record message key in Kinesis Streams?
The same key goes to the same partition (helps with ordering for a specific key)
What gets a message sent to a Kinesis shard?
a sequence number
What is hot partition in Kinesis?
A partition with a key that is not well distributed
What can you do to reduce costs and increase throughput in Kinesis?
Use batching
What is ProvisionedThroughputExceeded in Kinesis?
o Happens when sending more data (exceeding MB/s or TPS for any shard)
o Make sure you don’t have a hot shard (such as your partition key is bad and too much data goes to that partition)
What can you use to produce messages to Kinesis?
CLI, SDK, producer libraries from various frameworks
What can you use to consume messages to Kinesis?
CLI, SDK, (KCL) Kinesis Client Library (in many languages)
What security can you use in Kinesis?
Control access / authorization using IAM policies
Can you encrypt data moving to and at rest in Kinesis?
Yes
Can you access Kinesis offline?
Yes, using VPC endpoints
Is Kinesis Data Firehose real time?
No, it is near real time, 60 seconds latency
What is the format supported by Kinesis Data Firehose?
many
What features support Kinesis Data Firehose?
supports conversions, transformations, compression
How are you billed in Kinesis Data Firehose?
for the amount of data going through and conversions
Who sends data to Kinesis Data Firehose?
Not just Kinesis Streams:
SDK, KPL (Kinesis Producer Library), Kinesis Agent, CloudWatch
How can Kinesis Data Firehose transform data?
Using Lambda
Where does Kinesis Data Firehose send the data?
To S3, Redshift, ElasticSearch, Splunk
Where does Kinesis Data Firehose store the data?
no data storage, just ingest
What can you create from Kinesis Analytics real-time queries?
new streams
If I have 3 shards, and 2 contains a record and I receive a new record, where will it go?
It can go to any shard, supposing it has a new partition id
Do you need to set the number of shards in kinesis?
Yes, Up to 200 shards
How does work sending notifications in Kinesis?
Kinesis does not have such feature
What you need to send messages to Kinesis?
PutRecord API + Partition key that gets hashed
What are possible solutions to ProvisionedThroughputExceeded Exception in Kinesis
o Retries with backoff
o Increase shards (scaling)
o Ensure your partition key is a good one
What is used by the Kinesis Client Library to complement its work?
KCL uses DynamoDB to track other workers and share the work amongst shards
What is Kinesis Client Library?
Kinesis Client Library (KCL) is Java library that helps read record from a Kinesis Streams with distributed applications sharing the read workload
What is the rule for shards in KCL?
each shard is to be read by only one KCL instance
o Means 4 shards = max 4 KCL instances
o Means 6 shards = max 6 KCL instances
Many to One relationship
What needs KCL to use DynamoDB?
IAM access
Where can you run KCL?
can run on EC2, Elastic Beanstalk, on Premise Application
How are records read by KCLs?
Records are read in order at the shard level, cross shard order is not guaranteed
How is Kinesis security?
- Control access / authorization using IAM policies
- Encryption in flight using HTTPS endpoints
- Encryption at rest using KMS
- Possibility to encrypt / decrypt data client side (harder)
- VPC Endpoints available for Kinesis to access within VPC