Collection Flashcards
KDS
- Retention 1-365 days
- Record = Partition Key + Data Blob 1MB
- Provisioned
- IN : 1MB per shard per sec
- OUT : 2MB per shard per sec
- On-demand
- 4MB or 4000 records per second
- scales automatically based on throughput during last 30 days
- replicates to 3 AZ
Kinesis Producer SDK
- Use Cases : Support multiple programming languages
- PutRecord vs PutRecords
- PutRecords uses batching and increase throughput
- ProvisionedThroughputExceeded Exception
- Solution : Retries with backoff, increase # shards and choice of partition key
Kinesis Producer Library
- Use Cases : High performance and long-running producers
- Synchronous and Asynchronous API
- Batching –> 1MB/s or 1000 records /s
- Compression must be implemented by users
- KPL records must be decoded with KCL or special helper library
- RecordMaxBufferedTime 100ms
Kinesis Agent
- Use Cases : Monitor log files and send them to KDS
- On top of KPL
- Features
- write from multiple directories to multiple kinesis streams
- preprocess data before sending
- Able to handle file rotation, checkpointing and retry
- Emit metrics to CloudWatch for monitoring
Kinesis Consumer SDK
- 2MB per shard per second
- GetRecords returns up to 10MB /sec or up to 1000 records per second
- Max 5 GetRecords API
- 200ms latency
Kinesis Client Library
- Read records from Kinesis produced by KPL
- Share multiple shards with multiple consumer in one group
- Checkpointing feature to resume progress
- Leverage DynamoDB for checkpointing
- Make sure to provision enough WCU / RCU
- Use on-demand for DynamoDB otherwise DynamoDB will slow down KCL
- ExpiredIterationException
- Solution : increase WCU
Kinesis Connector Library
- S3
- DynamoDB
- Redshift
- ElasticSearch
Kinesis and Lambda
- Lambda can source records from KDS
- Lambda consumer has library to de-aggregate record from the KPL
Kinesis Enhanced Fan Out
- 2MB /consumer /sec /shard
- Kinesis pushes data to consumer over HTTP2
- 70 ms latency
- Default limit of 5 consumers using enhanced fan out per data stream
- Use SubscribeToShard API
Auto Scaling
- API call to change the number of shards is UpdateShardCount
- We can implement AutoScaling with AWS Lambda
KDS Security
- EIF : SSL
- EAR : KMS
- VPC
- KCL –> grant read and write access to DynamoDB table
Kinesis Data Firehose
- Fully managed
- Near real time (60 sec latency)
- Auto scaling
- Spark / KCL do not read from KDF
- Destination : s3, Splunk, Redshift, ElasticSearch
- Record Size 1MB
- Replicates records to 3 AZ
- Retention 24 hours
KDF Buffer
- 2 mins
- 32MB
SQS Standard
- Fully managed
- 1-14 days retention
- 10ms latency
- 256KB msg body + metadata
- Horizontal scaling in term of number of consumer
- Max 120,000 in-flight messages being processed by consumers
SQS Producing Messages
- Provide delay delivery
- Get back
- msg id
- md5 hash of the body
SQS Consuming Messages
- Poll 10 msg at a time
- Process the message within the visibility timeout
- Delete the msg using msg id and recipt handler
- max 120,000 in-flight msg being processed by consumers
SQS FIFO Queue
- Name of queue must end in .fifo
- Lower throughput (30,000 msg per sec with batching and 3000 per second without)
- messages are processed in order by consumer
- msg are sent exactly once
SQS Security
- EIF : HTTPs
- EAR : KMS
- IAM
IoT Device Gateway
- Serves as entry point for IoT devices connecting to AWS
- Supports MQTT, Websocket and HTTP1.1 protocols
- Fully managed
- Scale automatically to support over 1 billion Things
IoT Message Broker
- Pub Sub pattern with low latency
- Msg sent using MQTT, WebSocket and HTTP1.1
- Msg are published into topics
- Msg broker forwards msg to all clients connected to the topic
IoT Authentication
- 3 authN
- X.509 certification
- AWS SigV4
- Custom tokens with custom authorizers
- For mobile
- Cognito Identities
- Web / Desktop / CLI
- IAM
- Federated Identities
IoT Authorization
- AWS IoT Policies
- Attach to X.509 certificates or Cognito Identities
- Able to revoke any device at any time
- IoT Policies are JSON doc
- Can be attached to groups instead of individual Things
- IAM Policies
- Attached to users, group or roles
- Used for controlling IoT AWS APIs
IoT Device Shadow
- JSON doc representing the state of a connected Thing
- IoT Thing will retrieve the state when online and adapt
IoT Rules Engines
- Rules are defined on the MQTT topics
- Rules = when it is triggered
- Use Cases
- Augment or filter data received from a device
- Write data received from a device to a DynamoDB database
- Save a file to S3
- Rules need IAM roles to perform their actions
IoT Greengrass
- IoT Greengrass brings the compute layer to the device directly
- We can execute AWS Lambda functions on the devices
- Operate offline
- Deploy functions from the cloud directly to the devices
Data Migration Service
- Homo and Hetero
- Continuous data replication using Change Data Capture
- Require EC2 instance to perform the replication tasks
Data Migration Service Schema Conversion Tool
- Prefer compute-intensive instances to optimize data conversions
Direct Connect (DX)
- Provides a dedicated private connection from a remote network to your VPC
- Require to setup a Virtual Private Gateway on your VPC
- Use Cases
- Increase Bandwidth Throughput
- Consistent network experience
- Hybrid Env
- Support IPv4 and IPv6
- If DX is setup to one or more VPC in different regions, use Direct Connect Gateway
Direct Connection Types
- Dedicated
- 1,10,100 Gbps
- physical ethernet port dedicated to a customer
- Hosted
- 50,500Mbps to 10Gbps
- Capacity can be added or removed on demand
- Lead time are often longer than 1 month to establish a new connection
Direct Connect Encryption
- In Transit : not private
- AWS DC + VPC provides an IPsec encrypted private connection
Direct Connect Max Resiliency
- Multiple DX per 1 location
Snowcone
- Light
- Device for edge computing, storage and data transfer
- 8 TB
- Connect it to internet and use AWS DataSync to send data
Snowball Edge
- Storage Optimized
- 80TB
- Compute Optimized
- 42TB
- Provide block storage and S3-compatible object storage
Snowmobile
- 100PB
- High Security, Temperature Controlled, GPS, 247 video surveillance
- Better than Snowball if transferring more than 10PB
AWS OpsHub
- Manage your Snow Family Device
AWS Managed Streaming Kafka
- Fully managed
- Data Stored in EBS
- Message size 1MB to 10MB
- Choose number of AZ
- Choose the VPC and subnet
- Choose the broker instance type
- Choose the size of EBS volume
- Durability & Availability
- Ensure the replication factor (RF) is at least 2 for 2 AZ clusters and at least 3 for 3 AZ clusters
- Set minimum in-sync replicas (miniISR) to at most RF-1
MSK Security
- EIF : TLS
- EAR : KMS
- AuthN and AuthZ
- Mutual TLS + Kafka ACLs
- SASL / SCRAM + Kafka ACLs
- IAM
MSK Monitoring
- CloudWatch Metrics
- Basic monitoring, enhanced monitoring, topic level monitoring
- Prometheus
- Broker Log delivery
- To CloudWatch, S3, KDS
MSK Connect
- Managed Kafka Connect Workers
- Auto-scaling capabilities
- Deploy any Kafka Connect connectors to MSK as a plugin
KDS > SQS
- Ability for multiple applications to consume the same stream concurrently
- Ability to consume records in the same order a few hours later
KDF Sources
- KDF API
- KDS
- Other AWS Services
- Kinesis Agent
- AWS Lambda
KDF + Lambda Transformation
All transformed records from Lambda must be returned to Firehose with following 3 parameters
- recordId
- result
- data
Enable source record backup and KDF will deliver the un-transformed incoming data to a separate S3 bucket
Kinesis Video Streams
- Fully managed
- Service for media ingestion, storage and processing
- Use Cases
- Smart Home : Stream video and audio from camera-equipped home devices
- Smart City
- Industrial Automation
- Integrates with ML Framework
Kinesis Video Stream Concepts
- Video Stream
- resource that enables you to capture live video and other time-encoded data
- Fragment
- Self-contained sequence of media frames
- Chunk
- KVS stores videos in chunks