Domain 1: Collection Flashcards

1
Q

Which Kinesis services offers asynchronous features and high throughput?

A

Kinesis Producer Library

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Where must compression be implemented in Kinesis?

A

By the end user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How many GetRecords API calls are allowed per second by Kinesis streams in Classic mode?

A
  • Maximum of 5 GetRecords API calls per shard per second = 200ms latency
  • If 5 consumers application consume from the same shard, means every consumer can poll once a second and receive less than 400 KB/s
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the average latency in Kinesis Steams Enhanced Fan Out mode?

A

70ms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the throughput of Kinesis Consumer Classic mode?

A

2MB/sec

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the 4 services that Kinesis Firehose can write to?

A

S3, Redshift, ElasticSearch, Splunk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the key features of the Kinesis Producer Library (KPL)

A
  • Used for building high performance, long-running producers
  • Automated and configurable retry mechanism
  • Synchronous or Asynchronous API (better performance for async)
  • Submits metrics to CloudWatch for monitoring
  • Batching (Collect and Aggregate)
  • Compression must be implemented by the user
  • KPL Records must be de-coded with KCL or special helper library
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which protocol is not supported by IoT Device Gateway?

A

FTP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the minimum latency for Firehose with non full batches?

A

60 seconds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What data conversions are possible using Firehose with S3/

A

JSON to Parquet/ORC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What data transformations are possible using Firehose with Lambda?

A

CSV to JSON

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What compression algorithms are supported by Firehose with S3?

A

GZIP, ZIP, SNAPPY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What compression algorithm is supported by Firehose with Redshift?

A

GZIP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How are you charged on Firehose?

A

Amount of data going through Firehose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Can Spark and KCL read from Firehose?

A

No. They can only read from Kinesis Data Streams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the minimum buffer time in Firehose?

A

60 seconds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Can resharding be done in parallel?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

To how many AZs is data replicated in Kinesis Data Streams?

A

3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the default retention period in Kinesis Data Streams?

A

24 hours

(or customizable to 365 days)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Can data be deleted from Kinesis streams?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a key best practice with partition keys in Kinesis Streams?

A

Highly distributed keys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the maximum size of data blobs in Kinesis?

A

1MB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the throughput limits for Kinesis producers?

A

1MB/s or 1000 messages/s at write per shard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What happens if you exceed throughput limits on Kinesis producers?

A

Provisioned Throughput Exception

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are the throughput limits for Kinesis consumers in Classic mode?

A

2MB/s per shard across all consumers
5 API calls per second per shard across all consumers

26
Q

What are the throughput limits for Kinesis consumers in EFO mode?

A

2MB/s per shard across per enhanced consumer
No API calls needed

27
Q

Is Kinesis EFO a push or pull model?

A

Push

28
Q

What are the use cases for Kinesis Producer SDK?

A

Low throughput, high latency, keep it simple, AWS Lambda

29
Q

Can Kinesis Data Analytics produce back into Kinesis Data Streams?

A

Yes

30
Q

What is the first troubleshooting step if you get a Provisioned Throughput Exception?

A

Check for hot shards (bad partition key)

31
Q

How do you remediate a Provisioned Throughput exception?

A
  • Retries with backoff
  • Scale up your shards
  • Improve partitions
32
Q

What languages are available in Kinesis Producer Libraries?

A

C++ and Java

33
Q

Which Kinesis producer should be used for asynchronous requirements (or more performant requirements)?

A

Kinesis Producer Libraries

34
Q

Through what methods can Kinesis Producer Library records be de-coded?

A

Kinesis Client Library or special helper library (Lambda)

35
Q

What configuration item can be used to adjust buffer times for KPL batches? What is the default configuration?

A

RecordMaxBufferedTime; 100 ms

36
Q

Which Kinesis consumer option offers checkpointing using DynamoDB?

A

KCL (Client)

37
Q

On what service must Kinesis Connector Libraries run?

A

EC2

38
Q

What services can Kinesis Connector Libraries write to?

A

S3, DynamoDB, Elasticsearch, Redshift

39
Q

Which two tools have mostly replaced the use case for Kinesis Connector Libraries?

A

Firehose and Lambda

40
Q

What happens to the data in the old shard after it has been split or merged?

A

It will be deleted once the shard expires

41
Q

Which protocols are supported by IoT Gateway?

A

MQTT, WebSockets, HTTP 1.1

42
Q

What is IoT Message Broker?

A

Pub/Sub messaging tool, used for devices to communicate with each other

43
Q

What is IoT Thing Registry?

A

IAM for IoT, supports metadata, creates X.509 certificates, provides IoT Groups

44
Q

What are the three authentication methods for IoT Things?

A

X.509 certs, AWS SigV4, Custom tokens

45
Q

What is the IoT Rules Engine rules defined?

A

On the MQTT topics

46
Q

What is IoT Greengrass?

A

Allows compute (Lambda functions) to be executed on the IoT Thing itself.

47
Q

What are the 3 types of Collection (frequency)?

A
  1. Real-time (KDS, SQS, IoT)
  2. Near real-time (KDF, DMS)
  3. Batch (Snowball, Data Pipeline)
48
Q

What are the three parts of a Kinesis Stream Record?

A
  1. Data Blob: where the data is stored. max 1mb of data
  2. Record Key: sent alongside a record, helps to group records in Shards. Same key = Same shard.
  3. Sequence number: Unique identifier for each records put in shards. Added by Kinesis after ingestion
49
Q

What are the use cases for Kinesis Agent?

A
  • Monitor Log files and sends them to Kinesis Data Streams
  • Java-based agent, built on top of KPL
  • Install in Linux-based server environments
50
Q

What consumers are available for Kinesis Streams (Classic)?

A
  • Kinesis SDK
  • Kinesis Client Library (KCL)
  • Kinesis Connector Library
  • 3 rd party libraries: Spark, Log4J Appenders, Flume, Kafka Connect…
  • Kinesis Firehose
  • AWS Lambda
  • (Kinesis Consumer Enhanced Fan-Out discussed in the next lecture)
51
Q

When consuming data to DynamoDB, using KCL, what should you do if you get an ExpiredIteratorException?

A

KCL raises this exception because DynamoDB is not fast enough to keep up with the writes.

To solve that, you need to increase the WCU (Write Capacity Units) of the DynamoDB.

52
Q

What are the key differences of Enhanced-Fan Out (EFO) vs Standard Consumers?

A

Standard consumers:

  • Low number of consuming applications (1,2,3…)
  • Can tolerate ~200 ms latency
  • Minimize cost

Enhanced Fan Out Consumers:

  • Multiple Consumer applications for the same Stream
  • Low Latency requirements ~70ms
  • Higher costs (see Kinesis pricing page)
  • Default limit of 5 consumers using enhanced fan-out per data stream
53
Q

What is “out-of-order” records after resharding?

and how to solve it?

A
  • If you start reading the child before completing reading the parent, you could read data for a particular hash key out of order
  • to solve that, after a reshard, read entirely from the parent until you don’t have new records
  • Note: The Kinesis Client Library (KCL) has this logic already built-in, even after resharding operations
54
Q

How duplicates created by producers can be handled by consumers?

A
  • producers may embed a unique record ID so consumers can understand duplicates
  • consumers could be idempotent, which means they will know how to handle duplicates
  • or you can treat the duplication in the final destination, for example in a database
55
Q

key diffs: Streams vs Firehose

A

Streams

  • Going to write custom code (producer / consumer)
  • Real time (~200 ms latency for classic, ~70 ms latency for enhanced fan-out)
  • Must manage scaling (shard splitting / merging)
  • Data Storage for 1 to 365 days, replay capability, multi consumers
  • Use with Lambda to insert data in real-time to ElasticSearch (for example)

Firehose

  • Fully managed, send to S3, Splunk, Redshift, ElasticSearch
  • Serverless data transformations with Lambda
  • Near real time (lowest buffer time is 1 minute)
  • Automated Scaling • No data storage
56
Q

What are the 3 services that can receive a stream of CloudWatch Logs Subscription Filters?

A
  1. With Firehose for near real-time (may be cleaned/enriched with lambda)
  2. With Lambda for real-time
  3. With Kinesis Streams if Analytics is needed
57
Q

How can you get High Resiliency and Maximum Resiliency with Direct Connect?

A

High Resiliency: One connection at multiple locations

Maximum Resiliency: separate connections terminating on separate devices in more than one location

58
Q

What MSK stands for?

A
  • Managed Streaming for Apache Kafka
  • it is an alternative to Kinesis Data Streams
59
Q

What are the key diffs of Streams vs MSK?

A
60
Q

Which of the following Firehose does not write to?

  • S3
  • Redshift
  • DynamoDB
  • ElasticSearch / OpenSearch
  • Splunk
A

DynamoDB