Amazon Kinesis Flashcards
Amazon Kinesis ?
Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information.
How is data processed and the ingesting rate per second
Data is processed in “shards” – with each shard able to ingest 1000 records per second.
What is the default limit of Shards
There is a default limit of 500 shards, but you can request an increase to unlimited shards.
What does a record consist of:
A record consists of a partition key, sequence number, and data blob (up to 1 MB).
What is the Kinesis Transitient DataStore
Transient data store – default retention of 24 hours but can be configured for up to 7 days.
What are the 4 Types of Kinesis Services
- Kinesis Video Streams
- Kinesis Data Streams
- Kinesis Data Analytics
- Kinesis Data Firehouse
Kinesis Video Streams
Kinesis Video Streams makes it easy to securely stream video from connected devices to AWS for analytics, machine learning (ML), and other processing.
Durably stores, encrypts, and indexes video data streams, and allows access to data through easy-to-use APIs.
Producers provide data streams.
Stores data for 24 hours by default, up to 7 days.
Stores data in shards – 5 transaction per second for reads, up to a max read rate of 2MB per second and 1000 records per second for writes up to a max of 1MB per second.
Consumers receive and process data.
Can have multiple shards in a stream.
Supports encryption at rest with server-side encryption (KMS) with a customer master key.
Kinesis Video Streams does not appear much on AWS exams.
Kinesis Data Streams
Kinesis Data Streams enables you to build custom applications that process or analyze streaming data for specialized needs.
Kinesis Data Streams enables real-time processing of streaming big data.
What are Kinesis Data Streams Common Use Cases
- Accelerated log and data feed intake.
- Real-time metrics and reporting.
- Real-time data analytics.
- Complex stream processing.
high-level architecture of Kinesis Data Streams
- Producers continually push data to Kinesis Data Streams.
- Consumers process the data in real time.
- Consumers can store their results using an AWS service such as Amazon DynamoDB, Amazon Redshift, or Amazon S3.
- Kinesis Streams applications are consumers that run on EC2 instances.
- Shards are uniquely identified groups or data records in a stream.
- Records are the data units stored in a Kinesis Stream.
How can produces send data to Kinesis ?
- Kinesis Streams API.
- Kinesis Producer Library (KPL).
- Kinesis Agent.
What is a Record in Kinesis
A record is the unit of data stored in a Amazon Kinesis data stream.
A record is composed of a sequence number, partition key, and data blob.
By default, records of a stream are accessible for up to 24 hours from the time they are added to the stream (can be raised to 7 days by enabling extended data retention).
What is the Data Blob in a Kinesis Stream Record
A data blob is the data of interest your data producer adds to a data stream.
The maximum size of a data blob (the data payload before Base64-encoding) within one record is 1 megabyte (MB).
What is a Shard
A shard is the base throughput unit of an Amazon Kinesis data stream.
One shard provides a capacity of 1MB/sec data input and 2MB/sec data output.
Each shard can support up to 1000 PUT records per second.
A stream is composed of one or more shards.
The total capacity of the stream is the sum of the capacities of its shards.
What are the two types of resharding
- In a shard split, you divide a single shard into two shards.
- In a shard merge, you combine two shards into a single shard.
Splitting increases the number of shards in your stream and therefore increases the data capacity of the stream.
Splitting increases the cost of your stream (you pay per-shard).
Merging reduces the number of shards in your stream and therefore decreases the data capacity—and cost—of the stream.