Managed Streaming for Apache Kafka Flashcards
What is the default message size in AWS Kafka?
1 MB
Can message size be increased in AWS Kafka?
You can configure Apache Kafka to be able to send and receive large messages, for example, up to 10 megabytes
Kinesis has a hard limit of 1 megabyte per message
What is Kafka in AWS?
Alternative to Kinesis (Kafka vs Kinesis next lecture)
* Fully managed Apache Kafka on AWS
* Allow you to create, update, delete clusters
* MSK creates & manages Kafka brokers nodes & Zookeeper nodes for you
* Deploy the MSK cluster in your VPC, multi AZ (up to 3 for HA)
* Automatic recovery from common Apache Kafka failures
* Data is stored on EBS volumes
* You can build producers and consumers of data
MSK Configuration?
To set up a private cluster, you need to choose the number of availability zones recommended which is either two or three. Next, select the VPC and subnets. You also need to choose the broker instance type, for instance, m5.large, and determine the number of brokers per AZ. You can add more brokers over time. This setup results in one zookeeper and one Kafka broker per availability zone or two per AZ. For example, with three AZs, you have three Zookeeper nodes and six Kafka brokers. Finally, you need to choose the EBS volume size, which can range from 1 gigabyte to 16 terabytes. This enables you to retain data for as long as you need based on the time requirements. This provides more flexibility compared to Kinesis data streams.
Exam Question
Kafka Security?
Security is crucial in Apache Kafka, and you may be asked about it on the exam.
- In-flight encryption between brokers can be achieved using TLS, which is enabled by default but can be disabled for performance improvements.
- Optional TLS encryption can also be used for in-flight encryption between clients and brokers, which is also enabled by default but can be disabled for performance reasons.
- At-rest encryption for EBS volumes can be achieved using KMS, and network security can be enforced by attaching security groups to Kafka clients.
- Authentication and authorization are critical aspects of Kafka security.
- There are three mechanisms available for authentication and authorization: MutualTLS, SASL/SCRAM, and IAM Access Control.
- MutualTLS uses TLS certificates for both encryption and authentication, and Kafka ACLs are used for authorization at the topic level.
- SASL/SCRAM uses name/password authentication, and Kafka ACLs or IAM Access Control can be used for authorization.
- IAM Access Control allows for both authentication and authorization using IAM policies.
- Kafka ACLs for MutualTLS and SASL/SCRAM must be defined from within the Kafka cluster and cannot be managed using IAM policies.
MSK - Monitoring?
CloudWatch Metrics
* Basic monitoring (cluster and broker metrics)
* Enhanced monitoring (++enhanced broker metrics)
* Topic level monitoring (++enhanced topic level metrics)
Prometheus (Open Source Monitoring)
* Opens a port on the broker to export cluster, broker and topic level metrics
* Setup the JMX Exporter (metrics) or Node Exporter (CPU and disk metrics)
Broker Log Delivery
* Delivery to CloudWatch Logs
* Delivery to Amazon S3
* Delivery to Kinesis Data Streams
What is the default protocol for in-flight encryption between Kafka brokers?
TLS
Can TLS encryption between Kafka clients and brokers be disabled?
Yes
What is the mechanism for encryption at rest for EBS volumes in Kafka?
KMS
How can network security be enforced for Kafka clients?
By attaching security groups
What are the critical aspects of Kafka security?
Authentication and authorization
What are the three available mechanisms for authentication and authorization in Kafka?
MutualTLS, SASL/SCRAM, IAM Access Control
What is MutualTLS?
TLS certificates used for encryption and authentication
What is used for authorization in MutualTLS?
Kafka ACLs at the topic level
What is SASL/SCRAM?
Name/password authentication mechanism
What can be used for authorization in SASL/SCRAM?
Kafka ACLs or IAM Access Control
What does IAM Access Control allow for in Kafka security?
Both authentication and authorization using IAM policies
Can Kafka ACLs for MutualTLS and SASL/SCRAM be managed using IAM policies?
No, they must be defined from within the Kafka cluster
what is MSK connect?
Amazon Managed Streaming for Apache Kafka (MSK) Connect is a fully managed service that makes it easy to set up and run Kafka Connect data import and export jobs. Kafka Connect is a framework for connecting Kafka with external systems, allowing data to be imported and exported from Kafka topics to external systems, such as Amazon S3, Elasticsearch, and RDBMS. MSK Connect eliminates the need for customers to manage and maintain their own Kafka Connect clusters, allowing them to focus on building and running data streaming applications.
- Managed Kafka Connect workers on AWS
- Auto scaling capabilities for workers
- You can deploy any Kafka Connect connectors to MSK Connect as a plugin
- Amazon S3, Amazon Redshift, Amazon OpenSearch, Debezium, etc…
- Example pricing: Pay $0.11 per worker per hour
MSK Serverless
- Run Apache Kafka on MSK without managing the capacity
- MSK automatically provisions resources and scales compute & storage
- You just define your topics and your partitions and you’re good to go!
- Security: IAM Access Control for all clusters
- Example Pricing:
* $0.75 per cluster per hour = $558 monthly per cluster
* $0.0015 per partition per hour = $1.08 monthly per partition
* $0.10 per GB of storage each month
* $0.10 per GB in
* $0.05 per GB out
Difference between Kinesis and MKS?
- Kinesis Data Streams has a limit of 1MB per message size, while Amazon MSK has a default limit of 1MB but can be configured up to 10MB.
- Large messages in the exam should be answered with Amazon MSK instead of Kinesis Data Streams or Firehose.
- Kinesis Data Streams uses shards for scaling, while Amazon MSK uses partitions, which can only be added and not removed.
- Both Kinesis Data Streams and Amazon MSK offer KMS at-rest encryption, but Kinesis Data Streams has TLS in-flight encryption enabled by default, while Amazon MSK offers the option for plain text or TLS in-flight encryption.
- For security, Kinesis Data Streams uses IAM policies for authentication and authorization, while Amazon MSK offers mutual TLS with Kafka ACLs or SASL/SCRAM with Kafka ACLs for authentication and authorization, or IAM access control for both within MSK.