Kinesis Flashcards
What are receive and emit data rates for a kinesis shard in terms of MB/S and messages/sec?
Inbound: 1MB/S or 1000 messages/sec
Emit: 2MB/S or 2000 messages/sec
You have a kinesis data stream with several shards. You are storing user data from your website and have used country geo-location data as your partition key. One of the shards is frequently throwing a provisioned throughput exceeded exception. Why?
We need to look at the parition key. In this instance using country as a key may not be the best idea as we could have much more data coming from one country than from another. We would need to look at making the key more granular.
Are records ordered in kinesis? How long is data retained.
Records are ordered within a shard. Data is retained for 1 day by default configurable to 7 days max.
How many AZ’s is kinesis replicated to?
3 AZ’s automatically
You have a kinesis stream which usually receives 4MB/Sec in and emits 6MB/Sec out. There is an increase in data requiring an inbound rate of 7MB/Sec. What do you do? You currently have 4 shards
A shard can ingest 1MB/Sec. We have 4 shards handling our inbound so we will need to provision another 3 shards bringing the total up to 7.
Which Kinesis service allows data to be streamed directly into splunk?
Kinesis Firehose
In Kinesis Streams are records deleted after they have been consumed?
No, they are retained for the duration of the retention period
In Kinesis Streams can the same message be processed by more than one consumer?
Yes. This is because records are not deleted after they have been consumed unlike SQS.
In Kinesis streams there are THREE things we need to specify for a PutRecord call. What are they (hint: think in terms of how a stream works)
- The stream name
- The partition Key
- The data blob that you want to put into kinesis
For Kinesis Streams - what does the Partition Key in a putRecord request do?
The partition key defines the shard where the record is to be written
Do you need to scale your shards for Kinesis Firehose?
No - This is handled automatically by firehose.
Is Kinesis Firehose real time or near real time? Why?
Firehose is near real time (1-15 minutes latency). The reason is that FH will buffer data before writing. The more data that is buffered, the higher the latency.
What is the maximum data buffer size in Kinesis Firehose? (hint: its a magic number) What is the maximum time that data can be buffered?
128 MB and 900sec (15 min)
If I needed to transform click stream data before delivering it to my consuming application, would I use Kinesis Streams or Kinesis Firehose? What underlying technology supports this?
Kinesis Firehose allows you to transform data in the stream. It uses Lambda synchronous invocations to operated against a buffered batch of stream data (up to 3MB)
Where can Kinesis Data Analytics ingest data from and what language does it use for analytics?
We can ingest either via Kinesis Data Streams or Kinesis Fire Hose. Data analytics uses SQL to analyse the stream data.