AWS ML Eng Assoc - data storage 2 Flashcards
Kinesis Data Streams
A real-time data streaming service that can ingest and process large amounts of data in real-time
Shard
A unit of throughput capacity in Kinesis Data Streams. Each shard provides 1 MB/s of write capacity and 2 MB/s of read capacity
Partition Key
A key used to group data by shard within a stream
KPL (Kinesis Producer Library)
A library that helps you easily and reliably put data into Kinesis Data Streams
KCL (Kinesis Client Library)
A library that helps you consume and process data from Kinesis Data Streams
Enhanced Fan-Out
A feature that allows consumers to receive records from a stream with dedicated throughput of 2 MB/s per shard
Kinesis Data Firehose
A fully managed service for delivering real-time streaming data to destinations such as S3; Redshift; Elasticsearch; and Splunk
Kinesis Data Analytics
A service that allows you to process and analyze streaming data using SQL or Apache Flink
MSK (Managed Streaming for Apache Kafka)
A fully managed Apache Kafka service that allows you to build and run applications that use Apache Kafka to process streaming data
Shard Splitting
The process of increasing the number of shards in a Kinesis stream to increase capacity
Shard Merging
The process of combining two shards in a Kinesis stream to decrease capacity
Hot Shard
A shard that receives more data than others; potentially causing throughput issues
Kinesis Agent
A stand-alone Java application that offers an easy way to collect and send data to Kinesis Data Streams
Provisioned Mode
A capacity mode in Kinesis where you specify the number of shards for your stream
On-Demand Mode
A capacity mode in Kinesis where capacity is automatically managed to accommodate your workload