Distributed Message Queue Flashcards
Interview
What benefits do message queues bring?
Decoupling. Message queues eliminate the tight coupling between components so they can be updated independently.
Improved scalability. We can scale producers and consumers independently based on traffic load. For example, during peak hours, more consumers can be added to handle the increased traffic.
Increased availability. If one part of the system goes offline, the other components can continue to interact with the queue.
Better performance. Message queues make asynchronous communication easy. Producers can add messages to a queue without waiting for the response and consumers consume messages whenever they are available. They don’t need to wait for each other.
Functional requirements
- Producers send messages to a message queue.
- Consumers consume messages from a message queue.
- Messages can be consumed repeatedly or only once.
- Historical data can be truncated.
- Message size is in the kilobyte range.
- Ability to deliver messages to consumers in the order they were added to the queue.
- Data delivery semantics (at-least-once, at-most-once, or exactly-once) can be configured by users.
Non-functional requirements
High throughput or low latency, configurable based on use cases.
Scalable. The system should be distributed in nature. It should be able to support a sudden surge in message volume.
Persistent and durable. Data should be persisted on disk and replicated across multiple nodes.
What is an event streaming platform?
Event streaming is the practice of capturing real-time data from applications, databases and IoT devices and transporting it to various destinations for immediate processing and storage, or for real-time analysis and analytics reporting.
What are the messaging models ?
Point-to-point
Publish-subscribe
What is a topic ?
Topics are the categories used to organize messages. Each topic has a name that is unique across the entire message queue service. Messages are sent to and read from a specific topic.
What is topic partitioning(sharding) ?
We divide a topic into partitions and deliver messages evenly across partitions. Think of a partition as a small subset of the messages for a topic. Partitions are evenly distributed across the servers in the message queue cluster.
What are brokers ?
**Brokers ** are servers that hold partitions. The distribution of partitions among brokers is the key element to support high scalability. We can scale the topic capacity by expanding the number of partitions.
What is a consumer group ?
A consumer group is a set of consumers, working together to consume messages from topics.
Each consumer group can subscribe to multiple topics and maintain its own consuming offsets. For example, we can group consumers by use cases, one group for billing and the other for accounting.
A single partition can only be consumed by one consumer in the same group.
Data storage
What si write-ahead log (WAL)
WAL is just a plain file where new entries are appended to an append-only log. WAL is used in many systems, such as the redo log in MySQL and the WAL in ZooKeeper.
We recommend persisting messages as WAL log files on disk. WAL has a pure sequential read/write access pattern. The disk performance of sequential access is very good [4]. Also, rotational disks have large capacity and they are pretty affordable.
Data Storage
What is the message data structure ?
Field Name Data Type
key byte[]
value byte[]
topic string
partition integer
offset long
timestamp long
size integer
crc integer
What is Batching ?
Batching is critical to improving performance because:
It allows the operating system to group messages together in a single network request and amortizes the cost of expensive network round trips.
The broker writes messages to the append logs in large chunks, which leads to larger blocks of sequential writes and larger contiguous blocks of disk cache, maintained by the operating system. Both lead to much greater sequential disk access throughput.
Which is the Producer flow ?
If a producer wants to send messages to a partition, which broker should it connect to? The first option is to introduce a routing layer. All messages sent to the routing layer are routed to the “correct” broker. If the brokers are replicated, the “correct” broker is the leader replica.
The routing layer is wrapped into the producer and a buffer component is added to the producer. Both can be installed in the producer as part of the producer client library.
This change brings several benefits:
* Fewer network hops mean lower latency.
* Producers can have their own logic to determine which partition the message should be sent to.
* Batching buffers messages in memory and sends out larger batches in a single request. This increases throughput.
Which is the Consumer flow ?
The consumer specifies its offset in a partition and receives back a chunk of events beginning from that position.
Push vs Pull
How to do Consumer rebalancing ?
- Each consumer belongs to a group. It finds the dedicated coordinator by hashing the group name. All consumers from the same group are connected to the same coordinator.
- The coordinator maintains a joined consumer list. When the list changes, the coordinator elects a new leader of the group.
- As the new leader of the consumer group, it generates a new partition dispatch plan and reports it back to the coordinator. The coordinator will broadcast the plan to the other consumers in the group.
What is State Storage ?
The state storage stores:
- The mapping between partitions and consumers.
- The last consumed offsets of consumer groups for each partition.
What are the data access patterns for consumers states ?
The data access patterns for consumer states are:
- Frequent read and write operations but the volume is not high.
- Data is updated frequently and is rarely deleted.
- Random read and write operations.
- Data consistency is important.
Lots of storage solutions can be used for storing the consumer state data. Considering the data consistency and fast read/write requirements, a KV store like Zookeeper is a great choice. Kafka has moved the offset storage from Zookeeper to Kafka brokers.
What is Metadata storage ?
The metadata storage stores the configuration and properties of topics, including a number of partitions, retention period, and distribution of replicas.
Metadata does not change frequently and the data volume is small, but it has a high consistency requirement. Zookeeper is a good choice for storing metadata.
What is ZoeKeeper ?
Zookeeper is an essential service for distributed systems offering a hierarchical key-value store. It is commonly used to provide a distributed configuration service, synchronization service, and naming registry.
How to Replicate ?
Each partition has 3 replicas, distributed across different broker nodes.
Producers only send messages to the leader replica. The follower replicas keep pulling new messages from the leader. Once messages are synchronized to enough replicas, the leader returns an acknowledgment to the producer.
The distribution of replicas for each partition is called a replica distribution plan.
With the help of the coordination service, one of the broker nodes is elected as the leader. It generates the replica distribution plan and persists the plan in metadata storage. All the brokers now can work according to the plan.
Scalability
Producer scalability How to ?
The producer is conceptually much simpler than the consumer because it doesn’t need group coordination. The scalability of producers can easily be achieved by adding or removing producer instances.
Consumer scalability How to ?
Consumer groups are isolated from each other, so it is easy to add or remove a consumer group. Inside a consumer group, the rebalancing mechanism helps to handle the cases where a consumer gets added or removed, or when it crashes. With consumer groups and the rebalance mechanism, the scalability and fault tolerance of consumers can be achieved.
Broker scalability How to ?
- The initial setup: 3 brokers, 2 partitions, and 3 replicas for each partition.
- New broker 4 is added. Assume the broker controller changes the replica distribution of partition 2 to the broker (2, 3, 4). The new replica in broker 4 starts to copy data from leader broker 2. Now the number of replicas for partition 2 is temporarily more than 3.
- After the replica in broker 4 catches up, the redundant partition in broker 1 is gracefully removed.
What are the delivery semnatics ?
- at-most once
- at-least once
- exactly once.