Chapter 1: Meet Kafka Flashcards
Batches can contain messages from multiple partitions. T/F
F
Batches can contain messages from multiple topics. T/F
F
What is the benefit of a larger batch size?
Increased throughput
What is the benefit of a smaller batch size?
Decreased latency
How does Kafka reduce the bytes in a batch before sending it across the network
Compression
What are the 3 most common schema types
JSON, XML, Arvo
How do Arvo messages achieve a smaller size than JSON or XML messages?
Separating message payload and schema
Are messages guaranteed to be ordered within a topic?
No
Are messages guaranteed to be ordered within a partition?
Yes
How do partitions increase scalability and redundancy?
Splitting partitions across servers
A producer is about to produce a message which has no key, and the producer is not using a custom partitioner. How does the producer decide which partition to use?
The producer will distribute the message evenly across partitions
Describe two different methods of ensuring that two messages will be written to the same partition?
Use a custom partitioner or give both messages the same key
If I were to produce two messages to a topic, how could I ensure that those messages were consumed in the same order they were produced?
Put the messages in the same partition
What is the data type of an offset?
An integer
When a consumer restarts, how does it decide which message it should start reading
It reads the offset
What are the two places that the offset could be stored?
Zookeeper or Kafka
What is the cardinality between consumer groups and partitions?
One-to-many
How could one increase the throughput of a consumer group?
Adding more consumers
Consumer A owns partition B. How does Kafka ensure that partition B will continue to be processed in the event that consumer A dies?
Kafka re-assigns partition B to another consumer from the group of consumer A
True or false? A single broker can handle millions of partitions.
False
True or false? A single broker can handle thousands of partitions.
True
True or false? A single broker can handle millions of messages per second.
True
Within a cluster, how many brokers are responsible for assigning partitions to consumers?
One
What is the name of a broker that is responsible for assigning partitions to brokers
Controller
Fill in the blank: all producers and consumers of a partition can be connected to a single broker, called the ___
Leader
How does Kafka ensure redundancy of messages in a partition?
By replicating the partition in multiple brokers
Describe two simple ways that I could use to make a Kafka topic store messages for 1 month?
Change the broker retention setting or topic retention setting to 1 month
How could I limit the size of data (in bytes) stored in a Kafka topic to 1 GB, without affecting other topics?
Change the topic retention settings
How could I limit a Kafka topic so that only the most recent message is stored
Change the topic to be log compacted
Name 3 ways that one could increase the throughput of a topic?
It depends on the bottleneck. Options include:
- Increase batch size
- Increase number of consumers
- Increase number of brokers
- Increase number of partitions
- Increase computing power of servers
- Make consumer processing code more efficient
- Decrease message size
A topic is experiencing high latency between the producer and the consumer. What could be done to reduce this latency?
Decrease batch size, decrease message size, address any bottlenecks in the system (e.g not enough brokers)
How could I enforce the processing order of two messages?
Put the messages in the same topic and partition
A topic was being processed quickly but a spike in message frequency has increased the processing time of messages. The brokers have plenty of computing power to spare so they are not the bottleneck. How could I increase the speed at which messages are processed without altering the code which processes messages?
Increase the number of consumers and/or number of partitions