Data Processing Services Flashcards
Amazon Kinesis
collect, buffer, process, and analyze real-time, streaming data
How is data processed in Kinesis Data Streams
How much data is able to be processed
Data is processed in “shards”
1000 records per second per shard
GBs of data per second from thousands of sources
What does a kinesis record consist of?
a partition key
sequence number
data blob (up to 1 MB)
Does Kinesis Data Streams Store Data? If so, how? If not, why?
Transient data store – default retention of 24 hours, but can be configured for up to 7 days.
What are the types of Kinesis services?
Kinesis Video Streams
Kinesis Data Streams
Kineisis Firehose
Kinesis Data Analytics
Kinesis Video Streams
Durably stores, encrypts, and indexes video data streams, and allows access to data through APIs
Supports encryption at rest with server-side encryption (KMS) with a customer master key
Kinesis Video Streams - max read rate, max write rate?
5 transaction per second for reads, up to a max read rate of 2MB per second and 1000 records per second for writes up to a max of 1MB per second
Kinesis Data Streams
enables real-time processing of streaming data
stores data for later processing by applications
ingest/collect and process large streams of data records in real time at large scales
Amazon Kinesis Data Firehose
fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration to loading data streams into AWS data stores
What data operations can be performed on data in Amazon Kinesis Firehose
batch, compress, transform, and encrypt the data
Load Targets for Amazon Kinesis Firehose
Amazon S3, Splunk, ElasticSearch, and RedShift
Primary Use Case for AWS Kinesis Services
Handling Streaming Data
Does Kinesis Data Streams or Firehose provide ability to store streaming data? If so, for how long?
KDS provides ability to store streaming data for 1-7 days, Firehose doesn’t provide any facility for storing streaming data
Key Features of Amazon Kinesis Data Streams (KDS)?
Data collected available in milliseconds Enables real-time analytics Provides ordering of records Read or replay of records in the same order Transient data store
Amazon SQS
Simple Queue Service is a fully managed queuing service - no need to configure, install, or acquire software/hardware, queues dynamically created and scale automatically, no need to provision capacity
Amazon SQS Features/Impacts
decouple and scale micro services, distributed systems and serverless applications
Buffer messages to smooth out temporary volume spikes to handle temporary volume spikes or increased latency
built in mechanism for retry?
What are the two types of SQS Queues?
FIFO and Standard
Standard
Features of SQS standard queue
maximum throughput
best effort ordering
at least once delivery
Features of SQS FIFO queue
guarantee messages are processed exactly once - message remains until consumer processes and deletes it
no duplicate messages
process messages in the order they are sent
Disadvantage of KDS
It is not fully managed service, you must manually provision capacity/shards for it to scale
Difference between SNS and SQS
SNS push messages to multiple subscribers
SQS clients poll for messages - SQS distributes messages, and is used to decouple apps
When should you use Amazon MQ over SQS?
If existing app is being migrated to the cloud use Amazon MQ because it supports industry standard APIs and protocols
If starting from scratch use SQS
How is SQS billed?
Only pay for what you use
billed per request, plus data transfer out of SQS unless transfer is to EC2 or Lambda in the same region
Free tier provides 1M request per month at no charge
SQS Visibility Timeout? Default time? Min and max? Is behavior of time-out different for queue types?
a period of time during which Amazon SQS prevents other consumers from receiving and processing the message previously picked up by initial consumer
The default visibility timeout for a message is 30 seconds. The minimum is 0 seconds. The maximum is 12 hours.
For standard queues, the visibility timeout isn’t a guarantee against receiving a message twice.