Kafka Flashcards

1
Q

What is Apache Kafka?

A

Apache Kafka is an open-source distributed event streaming platform designed for handling real-time data feeds and stream processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the key components of Kafka?

A

Kafka consists of Producers, Consumers, Topics, Brokers, and ZooKeeper for coordination and configuration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the role of Producers in Kafka?

A

Producers are responsible for publishing data or messages to Kafka topics within the broker.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do Consumers do in Kafka?

A

Consumers subscribe to topics and process the published messages from Kafka brokers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define Kafka Topics.

A

Topics are categories or feeds to which messages are published by Producers and from which Consumers consume data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain Kafka Brokers

A

Brokers are Kafka servers responsible for storing and managing published data across distributed clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What role does ZooKeeper play in Kafka?

A

ZooKeeper is used for configuration management, synchronization, and coordination tasks within the Kafka cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the main use cases for Kafka?

A

Kafka is used for real-time data pipelines, streaming analytics, log aggregation, messaging systems, and more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the benefits of using Kafka?

A

Kafka offers high throughput, fault tolerance, horizontal scalability, and real-time stream processing capabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which programming languages can interact with Kafka?

A

Kafka provides APIs for interaction in Java, Python, Scala, and other languages, enabling integration across various systems and applications.
Apache Kafka serves as a foundational tool for real-time data processing, offering robust features for managing streams of data efficiently and reliably across distributed systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What sets Apache Kafka apart from other systems?

A

Kafka stands out due to its distributed streaming platform, high fault tolerance, scalability, and support for both streaming and batch processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Kafka’s approach to fault tolerance and data persistence?

A

Kafka ensures fault tolerance and durability by persisting data to disk and replicating it across multiple brokers in the cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does Kafka handle scalability?

A

Kafka allows horizontal scalability by effortlessly adding more brokers to the cluster to accommodate increased data throughput and storage needs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe Kafka’s messaging model.

A

Kafka follows a publish-subscribe (pub/sub) messaging model, enabling independent interaction between data producers and consumers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What programming languages does Kafka support?

A

Kafka provides APIs for various languages like Java, Python, Scala, allowing integration with diverse systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Name a few components of Kafka’s ecosystem.

A

Kafka ecosystem includes Kafka Streams for stream processing, Kafka Connect for data integration, and Schema Registry for schema management.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are some common use cases for Apache Kafka?

A

Kafka is used for log aggregation, event sourcing, real-time analytics, IoT telemetry, and more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How does Kafka handle high throughput and low latency?

A

Kafka’s architecture allows it to handle millions of messages per second with low latency, making it suitable for demanding real-time scenarios.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What support does Kafka have from its community?

A

As an open-source platform, Kafka benefits from a strong community providing extensive documentation, support, and continuous development.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What makes Kafka versatile for diverse applications?

A

Its flexibility, high performance, and support for various data processing paradigms enable Kafka to cater to a wide range of application scenarios.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Apache Pulsar

A

An open-source distributed messaging system that offers features like multi-tenancy, geo-replication, and efficient scaling for event streaming and messaging.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Amazon Kinesis

A

A managed streaming service provided by AWS for real-time data ingestion, processing, and analysis at scale, offering capabilities like Kinesis Data Streams, Data Firehose, and Data Analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Google Cloud Pub/Sub:

A

A fully-managed real-time messaging service on Google Cloud Platform that supports event streaming, messaging, and integration across applications and systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Azure Event Hubs

A

A cloud-based event streaming platform within Microsoft Azure, capable of handling massive volumes of events and telemetry data in real time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

RabbitMQ

A

An open-source message broker that supports multiple messaging protocols and provides features for reliable message queuing, routing, and scalability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Redis Streams

A

Redis, an in-memory data structure store, introduced streaming capabilities with Redis Streams, enabling message processing, event sourcing, and real-time analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

NATS

A

A high-performance, open-source messaging system known for its simplicity and efficiency in message routing, publish-subscribe, and queueing scenarios.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

IBM Streams

A

A platform for developing and deploying streaming applications on IBM Cloud that enables continuous data ingestion, processing, and analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Apache Flink

A

Although primarily a stream processing framework, Flink offers capabilities for distributed event streaming and data pipelining, supporting high-throughput and low-latency processing.

30
Q

Confluent Platform

A

Built on Apache Kafka, Confluent offers a platform with additional tools and components for managing Kafka clusters, stream processing, and enterprise-level features.

31
Q

What is Kafka known for in terms of scalability?

A

Kafka is highly scalable, capable of handling massive throughput and storage requirements by allowing easy addition of brokers to the cluster.

32
Q

How does Kafka ensure fault tolerance?

A

Kafka maintains fault tolerance through data replication and partitioning, ensuring high availability and durability even in the event of node failures.

33
Q

Name some components within Kafka’s ecosystem.

A

Kafka’s ecosystem includes tools like Kafka Connect, Kafka Streams, Schema Registry, offering a comprehensive suite for stream processing and data integration.

34
Q

What advantage does Kafka gain from its community?

A

With an active open-source community, Kafka receives extensive documentation, support, and continuous development.

35
Q

Why is Kafka praised for its performance?

A

Kafka exhibits high throughput and low latency, making it suitable for demanding real-time data processing and event streaming scenarios.

36
Q

How does Kafka support flexibility and integration?

A

Kafka provides APIs for various languages, facilitating integration with diverse systems and applications.

37
Q

In what use cases is Kafka versatile?

A

Kafka caters to diverse use cases such as log aggregation, event sourcing, real-time analytics, and IoT telemetry due to its flexibility and support for both streaming and batch processing.

38
Q

What are the fundamental components of Kafka’s architecture?

A

Kafka’s architecture comprises Producers, Topics, Partitions, Brokers, and Consumers.

39
Q

Define a Kafka Topic.

A

Topics in Kafka are categories or feeds to which messages are published by Producers and from which Consumers consume data.

40
Q

What is a Partition in Kafka?

A

Partitions are the segments within Kafka Topics, allowing parallelism and data distribution across multiple brokers.

41
Q

Explain the role of Brokers in Kafka.

A

Brokers are individual Kafka servers responsible for storing and managing the partitions and handling Producer and Consumer requests.

42
Q

How does Kafka ensure message durability and replication?

A

Kafka maintains message durability by replicating partitions across multiple Brokers, ensuring fault tolerance.

43
Q

What is the responsibility of the ZooKeeper in Kafka?

A

ZooKeeper is used for coordination, configuration management, and leadership election tasks within the Kafka cluster.

44
Q

How does Kafka achieve high throughput and low latency?

A

Kafka’s efficient design, log-structured storage, and partitioning allow it to handle high volumes of data with low latency.

45
Q

What mechanism ensures message ordering within a partition?

A

Within a partition, Kafka guarantees message ordering through sequential writes in the commit log.

46
Q

What is a stream ?

A

It is an unbounded continuous stream of real time flow of records/facts

47
Q

What is a Kafka Group ID?

A

A Kafka Group ID is a string identifier used to label consumers that belong to the same consumer group in Apache Kafka.

48
Q

How is a Kafka Group ID utilized in Kafka?

A

It allows multiple consumers to form a consumer group, where each group processes the subscribed topics, ensuring that each message is processed by only one consumer within the group.

49
Q

Why is Kafka Group ID important?

A

It helps in load balancing and parallel processing by allowing multiple consumers to work together within a consumer group, ensuring that each message is consumed by only one consumer within the group.

50
Q

Can consumers from different groups consume the same topic?

A

Yes, consumers from different group IDs can independently consume the same topic, allowing for parallel processing of messages.

51
Q

How is Kafka Group ID different from Kafka Consumer ID?

A

Kafka Group ID is used to label a set of consumers forming a consumer group, while Kafka Consumer ID uniquely identifies individual consumers within that group.

52
Q

What is Zookeeper used for in a distributed system?

A

Zookeeper is used to track data in strongly consistent form and manage configurations, leader elections, and distributed locks in a distributed system.

53
Q

What are the two types of nodes in Zookeeper?

A

Zookeeper nodes can be either Ephemeral or Persistent. Ephemeral nodes are used for tracking machine status and master election, while Persistent nodes are used for storing configuration variables.

54
Q

How does Zookeeper handle master election in a distributed system?

A

In Zookeeper, machines in a cluster compete to write their IP addresses to an ephemeral node, and only one succeeds, becoming the master. Clients can read this node to determine the current master.

55
Q

What is the purpose of setting a watch in Zookeeper?

A

Setting a watch allows clients to subscribe to updates on a specific Zookeeper node. When the data on that node changes, clients are notified, reducing the need for constant queries.

56
Q

Why is Zookeeper designed to run on a cluster of machines?

A

Zookeeper runs on a cluster of machines to avoid becoming a single point of failure. It uses leader election among its machines to ensure resilience.

57
Q

What is Kafka used for in distributed systems?

A

Kafka is a persistent queue system used for managing and processing high volumes of data in a publish-subscribe model.

58
Q

What is the significance of Kafka topics?

A

Kafka topics categorize events within a queue, allowing consumers to subscribe only to specific types of events, making data processing more efficient.

59
Q

How does Kafka handle event retention?

A

Kafka retains events for a specified period, cleaning up older events. Events are durable, ensuring they are not lost.

60
Q

What is the purpose of partitioning in Kafka?

A

Partitioning in Kafka allows for parallel processing and load distribution. It also helps maintain ordering within partitions.

61
Q

How does Kafka ensure consumers receive new events in a topic?

A

Kafka consumers track offsets, allowing them to fetch only new events in a topic and not reprocess previously consumed events.

62
Q

What are Kafka offsets?

A

Kafka offsets are numerical values that represent the position of a consumer within a specific partition of a Kafka topic.

63
Q

What does an offset indicate in Kafka?

A

An offset indicates the position of the last successfully consumed message within a partition, signifying the point up to which a consumer has read messages.

64
Q

How are offsets assigned to messages in Kafka partitions?

A

Each message in a Kafka partition is assigned a unique offset, which is a sequential integer starting from 0 for the earliest message and incrementing for each subsequent message.

65
Q

What is the purpose of committing offsets in Kafka?

A

Committing offsets is essential for fault tolerance and allows consumers to indicate the last successfully processed message. It helps consumers resume from the last committed offset in case of failures.

66
Q

What are the two types of offset commits in Kafka?

A

Kafka supports two types of offset commits: synchronous and asynchronous. Synchronous commits block until the commit is acknowledged, while asynchronous commits do not.

67
Q

What is the significance of Kafka’s “at-least-once” message delivery semantics in relation to offsets?

A

Kafka’s “at-least-once” semantics mean that offsets help prevent message duplication. Even if a consumer fails, it can resume from the last committed offset, avoiding reprocessing of messages it has already handled.

68
Q

Who is responsible for managing offsets in Kafka?

A

Consumers are responsible for managing offsets in Kafka. They need to store and update offsets for the partitions they are consuming from.

69
Q

In what scenarios might a consumer need to reset its offset in Kafka?

A

A consumer may need to reset its offset if it wants to reprocess messages from an earlier point within a partition, such as reprocessing historical data.

70
Q

How does Kafka handle offset management when multiple consumers are involved?

A

Kafka ensures that each consumer maintains its own offsets for the partitions it consumes from, allowing independent progress tracking.

71
Q

What is the role of offsets in Kafka’s fault tolerance and resilience?

A

Offsets play a crucial role in Kafka’s fault tolerance by enabling consumers to recover from failures and continue processing messages from the last committed offset.