AWS Data Collections Flashcards
AWS Types Collection
- Real - Time Collection
- Near Real-Time Collections
- Batch
Real-Time Collections Services
Kinesis Data Streams
SQS
IoT
Near Real Time Collections
Kinesis Data Firehose
Data Migration Service
Batch - Historical Analytics Services
Snowball
Data Pipeline
Explane Kinesis Data Streams Service
Managed service that allows you to collect, process and analyze real-time streaming data from various sources such as IoT, mobile devices, server logs, social networks and other real-time data sources.
Producers Kinesis Data Streams
Applications, Client, SDK, KPL, Kinesis Agent
Consumers Kinesis Data Streams
Apps (KCL, SDK), Lambda, Kinesis Data Firehose and Kinesis Data Analytics
Types Capacity Modes Kinesis Data Streams
Provisioned Mode
On-Deman Mode
In Kinesis Data Streams each shard gets in Provisioned Mode
1 MB/s or 1000 records per second
Kinesis In On-demand mode default capacity provisioned
4 MB/s in or 4000 records per second
Points in Kinesis Data Streams Security
IAM - Control Access
Encryption usin HTTPS endpoints
KMS encryption
encryption/decryption of data on client side
VPC Endpoints
Monitor API using CloudTrail
Explane Kinesis Producer SDK - PutRecords
API’s used PutRecords one and many records
PutRecords uses…
Batching and increases less HTTP requests
PutRecords use batching…
less HTTP requests
Kinesis Producer SDK - If we go over the limits
ProvisionedThroughputExceeded if we go over the limits
Managed AWS sources for Kinesis Data Streams
CloudWatch Logs, AWS IoT, Kinesis Data Analytics
We need to send data asynchronously API to Kinesis…
Key Producer Library (KPL)
How to submit metrics Kinesis Producer Library
CloudWatch for monitoring
KPL Batching some delay with…
RecordMaxBufferedTime (default 100 ms)
Define Features Kinesis Agent
Monitor log files send to KDS
Java-based agent
Install Linux server environments
Data Collection Services
Amazon Kinesis
AWS IoT Core
AWS Snowball
SQS
DMS
Direct Connect
O que é ProvisionedThroughputExceeded?
Exceção que pode ocorrer no Kinesis quando aplicação atinge o limite de provisionamento de taxa de transferência definido para o stream.
Causes ProvisionedThroughputExceeded Exceptions
exceeding MB/s or TPS for any shard
Make sure you don’t have a hot shard (such as your partition key is bad
and too much data goes to that partition
Solution for ProvisionedThroughputExceeded Exceptions
- Retries with backoff
- Increase shards (scaling)
- Ensure your partition key is a good one
Influency Kinesis Producer Library (KPL)
Batching
Introducing some
delay with RecordMaxBufferedTime (default 100ms)
Kinesis Producer Library – When not to
use
- The KPL can incur an additional processing delay of up to RecordMaxBufferedTime within the library (user configurable)
- Larger values of RecordMaxBufferedTime results in higher packing efficiencies and better performance
Kinesis Agent functions
- Monitor Log files and sends them to Kinesis Data Streams
- Java-based agent, built on top of KPL
- Install in Linux-based server environments
Features Kinesis Agent
- Write from multiple directories and multiple streams
- Routing feature based on directory / log file
- Pre-process data before sending to streams (single line, csv to json, log to
json…) - The agent handles file rotation, checkpointing, and retry upon failures
- Emits metrics to CloudWatch for monitoring
Elements Kinesis Consumers Classic
- Kinesis SDK
- Kinesis Client Library (KCL)
- Kinesis Connector Library
- 3rd party libraries: Spark,
Log4J Appenders, Flume,
Kafka Connect… - Kinesis Firehose
- AWS Lambda
Features Kinesis Consumer SDK - GetRecords
- Classic Kinesis - Records
are polled by consumers from
a shard - Each shard has 2 MB total
aggregate throughput - GetRecords returns up to
10MB of data (then throttle for
5 seconds) or up to 10000
records - Maximum of 5 GetRecords
API calls per shard per
second = 200ms latency - If 5 consumers application
consume from the same
shard, means every consumer
can poll once a second and
receive less than 400 KB/s
Kinesis Connector Library write data to:
- Amazon S3
- DynamoDB
- Redshift
- ElasticSearch
Each consumer per shards in Kinesis Enhanced Fan Out
2 MB/s
Means consumers and MB/s per shard in Kinesis Enhanced Fan Out
20 consumers and 40 MB/s
Tolerate latency in Standard Consumers
~ 200 ms
Latency requirements enhanced consumers
~ 70 ms
Destiny Kinesis Firehose
S3, Redshift, Elasticsearch, Splunk