Kinesis Flashcards

1
Q

What is Kinesis?

A

A managed alternative to Apache Kafka to Easily collect, process, and analyze video and data streams in real time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is mainly great for Kinesis?

A

for “Real-time” big data, for streaming processing frameworks (Spark, NiFi, etc…)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is data replicated in Kinesis?

A

automatically to 3 AZs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the Kinesis services?

A

Kinesis Streams
Kinesis Analytics
Kinesis Data Firehose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Kinesis Streams?

A

It is Kinesis itself, low latency streaming ingests at scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Kinesis Analytics?

A

managed service to perform real-time analytics on streams using SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Kinesis Data Firehose?

A

fully managed service to load streams into S3, Redshift, ElasticSearch, Splunk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are common streams consumed by Kinesis Streams?

A

ClickStreams
IoT devices
Metrics and logs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Are Kinesis Streams divided?

A

in ordered Shards / partitions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the data retention period by default in Kinesis Streams?

A

1 day, up to 7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What ability has Kinesis that SQS does not?

A

to reprocess / replay data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How many consumers can have a Kinesis Stream?

A

multiple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does Kinesis scale out?

A

adding new shards, it does not auto scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What can’t you do to data inserted in Kinesis?

A

delete it, it is inmutable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the writing speed of a Kinesis Stream Shard?

A

1 MB/s or 1000 messages at write PER SHARD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the reading speed of a Kinesis Stream Shard?

A

2 MB/s at read PER SHARD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How is billing in Kinesis Streams?

A

per shard provisioned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What can happens to the number of shards over time?

A

can evolve, reshard or merge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How are records ordered in Kinesis Streams?

A

per shard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What contains a record sent from a producer to Kinesis?

A

A message key and the data itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is useful for the record message key in Kinesis Streams?

A

The same key goes to the same partition (helps with ordering for a specific key)

22
Q

What gets a message sent to a Kinesis shard?

A

a sequence number

23
Q

What is hot partition in Kinesis?

A

A partition with a key that is not well distributed

24
Q

What can you do to reduce costs and increase throughput in Kinesis?

A

Use batching

25
Q

What is ProvisionedThroughputExceeded in Kinesis?

A

o Happens when sending more data (exceeding MB/s or TPS for any shard)
o Make sure you don’t have a hot shard (such as your partition key is bad and too much data goes to that partition)

26
Q

What can you use to produce messages to Kinesis?

A

CLI, SDK, producer libraries from various frameworks

27
Q

What can you use to consume messages to Kinesis?

A

CLI, SDK, (KCL) Kinesis Client Library (in many languages)

28
Q

What security can you use in Kinesis?

A

Control access / authorization using IAM policies

29
Q

Can you encrypt data moving to and at rest in Kinesis?

A

Yes

30
Q

Can you access Kinesis offline?

A

Yes, using VPC endpoints

31
Q

Is Kinesis Data Firehose real time?

A

No, it is near real time, 60 seconds latency

32
Q

What is the format supported by Kinesis Data Firehose?

A

many

33
Q

What features support Kinesis Data Firehose?

A

supports conversions, transformations, compression

34
Q

How are you billed in Kinesis Data Firehose?

A

for the amount of data going through and conversions

35
Q

Who sends data to Kinesis Data Firehose?

A

Not just Kinesis Streams:

SDK, KPL (Kinesis Producer Library), Kinesis Agent, CloudWatch

36
Q

How can Kinesis Data Firehose transform data?

A

Using Lambda

37
Q

Where does Kinesis Data Firehose send the data?

A

To S3, Redshift, ElasticSearch, Splunk

38
Q

Where does Kinesis Data Firehose store the data?

A

no data storage, just ingest

39
Q

What can you create from Kinesis Analytics real-time queries?

A

new streams

40
Q

If I have 3 shards, and 2 contains a record and I receive a new record, where will it go?

A

It can go to any shard, supposing it has a new partition id

41
Q

Do you need to set the number of shards in kinesis?

A

Yes, Up to 200 shards

42
Q

How does work sending notifications in Kinesis?

A

Kinesis does not have such feature

43
Q

What you need to send messages to Kinesis?

A

PutRecord API + Partition key that gets hashed

44
Q

What are possible solutions to ProvisionedThroughputExceeded Exception in Kinesis

A

o Retries with backoff
o Increase shards (scaling)
o Ensure your partition key is a good one

45
Q

What is used by the Kinesis Client Library to complement its work?

A

KCL uses DynamoDB to track other workers and share the work amongst shards

46
Q

What is Kinesis Client Library?

A

Kinesis Client Library (KCL) is Java library that helps read record from a Kinesis Streams with distributed applications sharing the read workload

47
Q

What is the rule for shards in KCL?

A

each shard is to be read by only one KCL instance
o Means 4 shards = max 4 KCL instances
o Means 6 shards = max 6 KCL instances
Many to One relationship

48
Q

What needs KCL to use DynamoDB?

A

IAM access

49
Q

Where can you run KCL?

A

can run on EC2, Elastic Beanstalk, on Premise Application

50
Q

How are records read by KCLs?

A

Records are read in order at the shard level, cross shard order is not guaranteed

51
Q

How is Kinesis security?

A
  • Control access / authorization using IAM policies
  • Encryption in flight using HTTPS endpoints
  • Encryption at rest using KMS
  • Possibility to encrypt / decrypt data client side (harder)
  • VPC Endpoints available for Kinesis to access within VPC