AWS Kinesis Flashcards

1
Q

What is AWS Kinesis

A

Kinesis is a managed “data streaming” service. It is great for application logs, metrics, IoT, real time big data, streaming processing frameworks (Spark, NiFi)

The data streamed is automatically replicated synchronously to 3 AZ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 applications under Kinesis and what are they used for

A
  • Kinesis Streams: low latency streaming ingest at scale
  • Kinesis Analytics: perform real-time analytics on streams using SQL
  • Kinesis Firehose: load streams into S3, Redshift, ElasticSearch & Splunk
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does Kinesis Streams work

A

Streams are divided in ordered Shards / Partitions

Producers -> Shards (1,2,3) -> Consumers

Data retention is 24 hours by default, can go up to 365 days
Ability to reprocess / replay data
Multiple applications can consume the same stream
* Real-time processing with scale of throughput
* Once data is inserted in Kinesis, it can’t be deleted (immutability). This is the main difference between SQS and Kinesis. Messages in SQS can be deleted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain Kinesis Streams Shards

A

Two modes for capacity:
* On-demand: no capacity planning, Kinesis scales shards automatically
* Provisioned: you manage the shards over time

  • Batching available or send messages one at a time
  • The number of shards can evolve over time (reshard / merge)
  • Records are ordered per shard
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are Kinesis producers and consumers

A

Kinesis Producers
* AWS SDK: simple producer
* Kinesis Producer Library (KPL):
batch, compression, retries, C++,
Java
* Kinesis Agent: For EC2s

Kinesis Consumers
* AWS SDK: simple consumer
* Lambda: (through Event source mapping)
* KCL: checkpointing, coordinated reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some limits with Kinesis Data Streams

A

Producer:
* 1MB/s or 1000 messages/s at write PER SHARD
* “ProvisionedThroughputException” otherwise

Consumer Classic:
* 2MB/s at read PER SHARD across all consumers
* 5 API calls per second PER SHARD across all consumers

Consumer Enhanced Fan-Out:
* 2MB/s at read PER SHARD, PER ENHANCED CONSUMER
* No API calls needed (push model)

Data Retention:
* 24 hours data retention by default
* Can be extended to 365 days

How well did you know this?
1
Not at all
2
3
4
5
Perfectly