AWS ML Eng Assoc - data storage 2 Flashcards
Kinesis Data Streams
A real-time data streaming service that can ingest and process large amounts of data in real-time
Shard
A unit of throughput capacity in Kinesis Data Streams. Each shard provides 1 MB/s of write capacity and 2 MB/s of read capacity
Partition Key
A key used to group data by shard within a stream
KPL (Kinesis Producer Library)
A library that helps you easily and reliably put data into Kinesis Data Streams
KCL (Kinesis Client Library)
A library that helps you consume and process data from Kinesis Data Streams
Enhanced Fan-Out
A feature that allows consumers to receive records from a stream with dedicated throughput of 2 MB/s per shard
Kinesis Data Firehose
A fully managed service for delivering real-time streaming data to destinations such as S3; Redshift; Elasticsearch; and Splunk
Kinesis Data Analytics
A service that allows you to process and analyze streaming data using SQL or Apache Flink
MSK (Managed Streaming for Apache Kafka)
A fully managed Apache Kafka service that allows you to build and run applications that use Apache Kafka to process streaming data
Shard Splitting
The process of increasing the number of shards in a Kinesis stream to increase capacity
Shard Merging
The process of combining two shards in a Kinesis stream to decrease capacity
Hot Shard
A shard that receives more data than others; potentially causing throughput issues
Kinesis Agent
A stand-alone Java application that offers an easy way to collect and send data to Kinesis Data Streams
Provisioned Mode
A capacity mode in Kinesis where you specify the number of shards for your stream
On-Demand Mode
A capacity mode in Kinesis where capacity is automatically managed to accommodate your workload
Random Cut Forest
An algorithm used in Kinesis Data Analytics for anomaly detection in streaming data
Use case: Streaming ETL
Using Kinesis Data Analytics or MSK to perform real-time Extract; Transform; Load operations on streaming data
Use case: Real-time Analytics
Using Kinesis Data Streams and Kinesis Data Analytics to process and analyze data in real-time for insights
Use case: Log and Event Data Processing
Using Kinesis to ingest and process log files and event data from various sources in real-time
Mnemonic: KPL puts; KCL gets
Remember that Kinesis Producer Library (KPL) is used to put data into streams; while Kinesis Client Library (KCL) is used to get data from streams
Metaphor: Kinesis as a river
Think of Kinesis Data Streams as a river; producers add water (data) upstream; consumers take water out downstream; shards are the width of the river determining how much water can flow