Managed Streaming for Apache Kafka Flashcards

1
Q

What is the default message size in AWS Kafka?

A

1 MB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Can message size be increased in AWS Kafka?

A

You can configure Apache Kafka to be able to send and receive large messages, for example, up to 10 megabytes

Kinesis has a hard limit of 1 megabyte per message

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Kafka in AWS?

A

Alternative to Kinesis (Kafka vs Kinesis next lecture)
* Fully managed Apache Kafka on AWS
* Allow you to create, update, delete clusters
* MSK creates & manages Kafka brokers nodes & Zookeeper nodes for you
* Deploy the MSK cluster in your VPC, multi AZ (up to 3 for HA)
* Automatic recovery from common Apache Kafka failures
* Data is stored on EBS volumes
* You can build producers and consumers of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

MSK Configuration?

A

To set up a private cluster, you need to choose the number of availability zones recommended which is either two or three. Next, select the VPC and subnets. You also need to choose the broker instance type, for instance, m5.large, and determine the number of brokers per AZ. You can add more brokers over time. This setup results in one zookeeper and one Kafka broker per availability zone or two per AZ. For example, with three AZs, you have three Zookeeper nodes and six Kafka brokers. Finally, you need to choose the EBS volume size, which can range from 1 gigabyte to 16 terabytes. This enables you to retain data for as long as you need based on the time requirements. This provides more flexibility compared to Kinesis data streams.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Exam Question

Kafka Security?

A

Security is crucial in Apache Kafka, and you may be asked about it on the exam.
- In-flight encryption between brokers can be achieved using TLS, which is enabled by default but can be disabled for performance improvements.
- Optional TLS encryption can also be used for in-flight encryption between clients and brokers, which is also enabled by default but can be disabled for performance reasons.
- At-rest encryption for EBS volumes can be achieved using KMS, and network security can be enforced by attaching security groups to Kafka clients.
- Authentication and authorization are critical aspects of Kafka security.
- There are three mechanisms available for authentication and authorization: MutualTLS, SASL/SCRAM, and IAM Access Control.
- MutualTLS uses TLS certificates for both encryption and authentication, and Kafka ACLs are used for authorization at the topic level.
- SASL/SCRAM uses name/password authentication, and Kafka ACLs or IAM Access Control can be used for authorization.
- IAM Access Control allows for both authentication and authorization using IAM policies.
- Kafka ACLs for MutualTLS and SASL/SCRAM must be defined from within the Kafka cluster and cannot be managed using IAM policies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

MSK - Monitoring?

A

CloudWatch Metrics
* Basic monitoring (cluster and broker metrics)
* Enhanced monitoring (++enhanced broker metrics)
* Topic level monitoring (++enhanced topic level metrics)

Prometheus (Open Source Monitoring)
* Opens a port on the broker to export cluster, broker and topic level metrics
* Setup the JMX Exporter (metrics) or Node Exporter (CPU and disk metrics)

Broker Log Delivery
* Delivery to CloudWatch Logs
* Delivery to Amazon S3
* Delivery to Kinesis Data Streams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the default protocol for in-flight encryption between Kafka brokers?

A

TLS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Can TLS encryption between Kafka clients and brokers be disabled?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the mechanism for encryption at rest for EBS volumes in Kafka?

A

KMS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can network security be enforced for Kafka clients?

A

By attaching security groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the critical aspects of Kafka security?

A

Authentication and authorization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the three available mechanisms for authentication and authorization in Kafka?

A

MutualTLS, SASL/SCRAM, IAM Access Control

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is MutualTLS?

A

TLS certificates used for encryption and authentication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is used for authorization in MutualTLS?

A

Kafka ACLs at the topic level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is SASL/SCRAM?

A

Name/password authentication mechanism

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What can be used for authorization in SASL/SCRAM?

A

Kafka ACLs or IAM Access Control

17
Q

What does IAM Access Control allow for in Kafka security?

A

Both authentication and authorization using IAM policies

18
Q

Can Kafka ACLs for MutualTLS and SASL/SCRAM be managed using IAM policies?

A

No, they must be defined from within the Kafka cluster

19
Q

what is MSK connect?

A

Amazon Managed Streaming for Apache Kafka (MSK) Connect is a fully managed service that makes it easy to set up and run Kafka Connect data import and export jobs. Kafka Connect is a framework for connecting Kafka with external systems, allowing data to be imported and exported from Kafka topics to external systems, such as Amazon S3, Elasticsearch, and RDBMS. MSK Connect eliminates the need for customers to manage and maintain their own Kafka Connect clusters, allowing them to focus on building and running data streaming applications.

  • Managed Kafka Connect workers on AWS
  • Auto scaling capabilities for workers
  • You can deploy any Kafka Connect connectors to MSK Connect as a plugin
    • Amazon S3, Amazon Redshift, Amazon OpenSearch, Debezium, etc…
  • Example pricing: Pay $0.11 per worker per hour
20
Q

MSK Serverless

A
  • Run Apache Kafka on MSK without managing the capacity
  • MSK automatically provisions resources and scales compute & storage
  • You just define your topics and your partitions and you’re good to go!
  • Security: IAM Access Control for all clusters
  • Example Pricing:
    * $0.75 per cluster per hour = $558 monthly per cluster
    * $0.0015 per partition per hour = $1.08 monthly per partition
    * $0.10 per GB of storage each month
    * $0.10 per GB in
    * $0.05 per GB out
21
Q

Difference between Kinesis and MKS?

A
  • Kinesis Data Streams has a limit of 1MB per message size, while Amazon MSK has a default limit of 1MB but can be configured up to 10MB.
  • Large messages in the exam should be answered with Amazon MSK instead of Kinesis Data Streams or Firehose.
  • Kinesis Data Streams uses shards for scaling, while Amazon MSK uses partitions, which can only be added and not removed.
  • Both Kinesis Data Streams and Amazon MSK offer KMS at-rest encryption, but Kinesis Data Streams has TLS in-flight encryption enabled by default, while Amazon MSK offers the option for plain text or TLS in-flight encryption.
  • For security, Kinesis Data Streams uses IAM policies for authentication and authorization, while Amazon MSK offers mutual TLS with Kafka ACLs or SASL/SCRAM with Kafka ACLs for authentication and authorization, or IAM access control for both within MSK.