Kafka Flashcards

Question

RabbitMQ

Answer 1

An open-source message broker that supports multiple messaging protocols and provides features for reliable message queuing, routing, and scalability.

Answer 2

Redis, an in-memory data structure store, introduced streaming capabilities with Redis Streams, enabling message processing, event sourcing, and real-time analytics.

Answer 3

A high-performance, open-source messaging system known for its simplicity and efficiency in message routing, publish-subscribe, and queueing scenarios.

Answer 4

A platform for developing and deploying streaming applications on IBM Cloud that enables continuous data ingestion, processing, and analytics.

Answer 5

Although primarily a stream processing framework, Flink offers capabilities for distributed event streaming and data pipelining, supporting high-throughput and low-latency processing.

Answer 6

Built on Apache Kafka, Confluent offers a platform with additional tools and components for managing Kafka clusters, stream processing, and enterprise-level features.

Answer 7

Kafka is highly scalable, capable of handling massive throughput and storage requirements by allowing easy addition of brokers to the cluster.

Answer 8

Kafka maintains fault tolerance through data replication and partitioning, ensuring high availability and durability even in the event of node failures.

Answer 9

Kafka's ecosystem includes tools like Kafka Connect, Kafka Streams, Schema Registry, offering a comprehensive suite for stream processing and data integration.

Answer 10

With an active open-source community, Kafka receives extensive documentation, support, and continuous development.

Answer 11

Kafka exhibits high throughput and low latency, making it suitable for demanding real-time data processing and event streaming scenarios.

Answer 12

Kafka provides APIs for various languages, facilitating integration with diverse systems and applications.

Answer 13

Kafka caters to diverse use cases such as log aggregation, event sourcing, real-time analytics, and IoT telemetry due to its flexibility and support for both streaming and batch processing.

Answer 14

Kafka's architecture comprises Producers, Topics, Partitions, Brokers, and Consumers.

Answer 15

Topics in Kafka are categories or feeds to which messages are published by Producers and from which Consumers consume data.

Answer 16

Partitions are the segments within Kafka Topics, allowing parallelism and data distribution across multiple brokers.

Answer 17

Brokers are individual Kafka servers responsible for storing and managing the partitions and handling Producer and Consumer requests.

Answer 18

Kafka maintains message durability by replicating partitions across multiple Brokers, ensuring fault tolerance.

Answer 19

ZooKeeper is used for coordination, configuration management, and leadership election tasks within the Kafka cluster.

Answer 20

Kafka's efficient design, log-structured storage, and partitioning allow it to handle high volumes of data with low latency.

Answer 21

Within a partition, Kafka guarantees message ordering through sequential writes in the commit log.

Answer 22

It is an unbounded continuous stream of real time flow of records/facts

Answer 23

A Kafka Group ID is a string identifier used to label consumers that belong to the same consumer group in Apache Kafka.

Answer 24

It allows multiple consumers to form a consumer group, where each group processes the subscribed topics, ensuring that each message is processed by only one consumer within the group.

Answer 25

It helps in load balancing and parallel processing by allowing multiple consumers to work together within a consumer group, ensuring that each message is consumed by only one consumer within the group.

Answer 26

Yes, consumers from different group IDs can independently consume the same topic, allowing for parallel processing of messages.

Answer 27

Kafka Group ID is used to label a set of consumers forming a consumer group, while Kafka Consumer ID uniquely identifies individual consumers within that group.

Answer 28

Zookeeper is used to track data in strongly consistent form and manage configurations, leader elections, and distributed locks in a distributed system.

Answer 29

Zookeeper nodes can be either Ephemeral or Persistent. Ephemeral nodes are used for tracking machine status and master election, while Persistent nodes are used for storing configuration variables.

Answer 30

In Zookeeper, machines in a cluster compete to write their IP addresses to an ephemeral node, and only one succeeds, becoming the master. Clients can read this node to determine the current master.

Answer 31

Setting a watch allows clients to subscribe to updates on a specific Zookeeper node. When the data on that node changes, clients are notified, reducing the need for constant queries.

Answer 32

Zookeeper runs on a cluster of machines to avoid becoming a single point of failure. It uses leader election among its machines to ensure resilience.

Answer 33

Kafka is a persistent queue system used for managing and processing high volumes of data in a publish-subscribe model.

Answer 34

Kafka topics categorize events within a queue, allowing consumers to subscribe only to specific types of events, making data processing more efficient.

Answer 35

Kafka retains events for a specified period, cleaning up older events. Events are durable, ensuring they are not lost.

Answer 36

Partitioning in Kafka allows for parallel processing and load distribution. It also helps maintain ordering within partitions.

Answer 37

Kafka consumers track offsets, allowing them to fetch only new events in a topic and not reprocess previously consumed events.

Answer 38

Kafka offsets are numerical values that represent the position of a consumer within a specific partition of a Kafka topic.

Answer 39

An offset indicates the position of the last successfully consumed message within a partition, signifying the point up to which a consumer has read messages.

Answer 40

Each message in a Kafka partition is assigned a unique offset, which is a sequential integer starting from 0 for the earliest message and incrementing for each subsequent message.

Answer 41

Committing offsets is essential for fault tolerance and allows consumers to indicate the last successfully processed message. It helps consumers resume from the last committed offset in case of failures.

Answer 42

Kafka supports two types of offset commits: synchronous and asynchronous. Synchronous commits block until the commit is acknowledged, while asynchronous commits do not.

Answer 43

Kafka's "at-least-once" semantics mean that offsets help prevent message duplication. Even if a consumer fails, it can resume from the last committed offset, avoiding reprocessing of messages it has already handled.

Answer 44

Consumers are responsible for managing offsets in Kafka. They need to store and update offsets for the partitions they are consuming from.

Answer 45

A consumer may need to reset its offset if it wants to reprocess messages from an earlier point within a partition, such as reprocessing historical data.

Answer 46

Kafka ensures that each consumer maintains its own offsets for the partitions it consumes from, allowing independent progress tracking.

Answer 47

Offsets play a crucial role in Kafka's fault tolerance by enabling consumers to recover from failures and continue processing messages from the last committed offset.

Kafka Flashcards

(71 cards)