Kafka Producer Flashcards by Narendra Pathai

What is ProducerRecord?

ProducerRecord is combination of header + payload that the producer uses to send data to Kafka.

How well did you know this?

Not at all

Perfectly

What does ProducerRecord contain?

Producer record contains mandatory and optional parameters.

1) Topic (mandatory)
2) Key (optional)
3) Parition No (optional)
4) Payload (mandatory)

How well did you know this?

Not at all

Perfectly

What is the use of broker-list parameter that is provided to Producer?

Using that broker list, the producer will discover about cluster and connect with all of the brokers in the cluster.

How well did you know this?

Not at all

Perfectly

How many brokers is Producer connected to in cluster?

Producer is connected with all of the brokers in cluster.

How well did you know this?

Not at all

Perfectly

What is the flow of sending message in Kafka?

1) Create ProducerRecord
2) Serialize
3) Partitioner -> partition is identified
4) Adds record to batch of records that will also be sent to same topic and partition.
5) Separate thread is responsible for sending those batch messages to appropriate Kafka Brokers.
6) If successfull then RecordMetadata is returned
7) If failure then either retry is done or exception.

How well did you know this?

Not at all

Perfectly

What does RecordMetadata contain?

Record metadata is returned when a message is successfully published to Kafka. It contains topic, partition and the offset of the record within the partition.

How well did you know this?

Not at all

Perfectly

When a leader of partition goes down, whose responsibility is it to retry to send to newly elected leader?

It is the producer’s responsibility to try to send to newly elected leader for that partition.

How well did you know this?

Not at all

Perfectly

Lets say there are 3 replicas configured and one of the broker is leader for that partition? How is data kept in sync with replicas? Is it push model or pull model?

It is pull model. It’s not the responsibility of leader partition to send data to replicas. Replicas pull information from leader.

How well did you know this?

Not at all

Perfectly

Which are important producer configurations?

Mandatory

1) Bootstrap servers - To which the produer will communicate to find out information about the cluster
2) KeySerializer - Using which the key will be serialized
3) ValueSerializer - Useful for custom class.

How well did you know this?

Not at all

Perfectly

Provide Java template to create a new Producer with properties

Properties props = new Properties();

props. put(“bootstrap.servers”, “192.168.56.101:9101”);
props. put(“key.serializer”, “org.apache.kafka.common.serialization.StringSerializer”);
props. put(“value.serializer”, “org.apache.kafka.common.serialization.StringSerializer”);

Producer producer = new KafkaProducer<>(props);

How well did you know this?

Not at all

Perfectly

What category of errors can be there in Kafka?

1) Retriable errors for example “no leader” which can be resolved by electing a new leader or “connection error” which can be resolved by reestrablishing a connection are retryable errors. KafkaProducer can be configured to retry on those errors automatically.
2) Non-retriable/permanent errors - are errors whch cannot be resolved and are not transient. Like “message size too large” . In this case KafkaProducer will not retry and throw exception immediately.

How well did you know this?

Not at all

Perfectly

Provide Java Template code to send message in Kafka

Producer producer = new KafkaProducer<>(props);
ProducerRecord record = new ProducerRecord<>(“topic”, “key”, “value”);
producer.send(record);

How well did you know this?

Not at all

Perfectly

What is ack property in producer?

Ack property controls how many successful write receipts do we wait for before moving to other record.
0 - Disabled, so we won’t get ack and will move on to other record. Fire and forget. There is no guarantee that message will even reach there.
1 - Receive from .Atleast one broker should reply with successfull write.
2 - Receive from all the brokers. In that case we will wait till we get ack response from n servers, n being replication factor. This will be very slow.

How well did you know this?

Not at all

Perfectly

Which are some exceptions that we can get before message is sent to the broker

SerializationException, BufferExhaustedException, TimeoutException or InterruptException are some of the exceptions.

How well did you know this?

Not at all

Perfectly

Which are the three ways to send message to Kafka

1) Fire and Forget - send message without caring if it was successful or not. Most will if Kafka is fault tolerant.
2) Synchronous send - We send message and we get a future in return on which we can do get to find if send was successful or not.
3) Asynchronous Send - We can attach callback and it will be called as soon as resopnse is received.

How well did you know this?

Not at all

Perfectly

What is the return type of synchronous send?

Study These Flashcards

Future response = producer.send(record);

Provide template to send asynchronous message.

Study These Flashcards

You need to provide Callback interface implementation from Kafka API which contains onCompletion(RecoredMetadata, Exception) method.

producer.send(record, callbackImplentation);

What is buffer.memory configuration in Producer?

Study These Flashcards

It configures the amount of memory, producer will use to buffer messages before sending. If messages are sent faster than they are delivered, then producer may run out of space. Additional send() will either block or throw an exception.

What is compression.type configuration parameter in Producer?

Study These Flashcards

By default messages are sent uncompressed. But compression algorithms can be used to compress message before sending it to brokers.
Examples are gzip, snappy or lz4.

What is retries configuration parameter in Producer?

Study These Flashcards

Controls the number of times producer retries to send message in case of transient error. By default producer waits 100ms before retry. This backoff parameter can be controlled by retry.backoff.ms parameter.

What is client.id configuration parameter in Producer?

Study These Flashcards

Used by brokers to identify messages sent from a client.

Used in logging, metrics and for quotas.

What is receive.buffer.bytes and send.buffer.bytes configuration paramter in Producer?

Study These Flashcards

Controls TCP level send and receive buffer used by sockets.

If -1, OS level configuration is used.

What is max.in.flight.requests.per.connection configuration Paramter in Prodcuer?

Study These Flashcards

Controls how many messages producer will send to the server before receiving the response. Setting this high increases memory usage and improves throughtput. Setting it too high can reduce throughput as batching becomes less efficient.

What guarantee do we get by setting max.in.flight.requests.per.connection parameter = 1?

Study These Flashcards

Will guarantee that messages will be written in order they were sent.

What is request.timeout.ms parameter in Producer?

Controls how long producer will wait for response from server and requesting metadata. If timeout is reached without response, then either it will retry or will throw an error.

What max.block.ms configuration parametr in Producer?

Controls how long the producer will block when send() & when explicitly requesting metadata using partitionFor(). Those methods block when producer's send buffer is full or when metadata is not available. When max.block.ms is reached then timeout excpetion is thrown. Either wait for buffer to be full or set max block time then message will be written.

What is max.request.size coniguration parameter in Kafka?

What is the max message size that you can write. If greater than there will be exception. Caps both the size of the largest message that can be sent and the size of messages in batch that producer can send in one request.

What to do when default serialization provided by Kafka is not enough?

You can use custom serializers. Generic serialization library like Avro, Protobuff or Thrift can be used.

What is Avro?

Avro is a language neutral data serialization format. Schema is defined in JSON. When writing application switches to new schema, the reading applications continue processing messages without requiring any change.

Sample AVRO schema

{ "type" : "record", "namespace" : "Tutorialspoint", "name" : "Employee", "fields" : [ { "name" : "Name" , "type" : "string" }, { "name" : "Age" , "type" : "int" }, {"name": "Email", "type" : ["null", "string"], "default" : "null" ] } Email property is optional, hence we put type as null with string.

What are the two options to send schema file to consumer?

Avro requires entire schema while reading the record. There are two options: 1) Send schema with each record - but it will have huge impact on performace. 2) Use schema registry to store schemas, so that they can be located from there.

Explain flow of Avro serialization with Schema Registry

1) Producer stores schemas used to write data to Kafka in the Schema Registry 2) Producer before serializing the message fetches the schema from schema registry 3) With serialized data, the identifer of schema is also passed with each record 4) When consumer receives the record using the identifier it locates schema from Schema Registry. 5) Deserializes the record using latest schema.

Which serializer class is used for Avro serialization?

KafkaAvroSerializer class is used.

Which property is required for Avro Schema registry serialization?

schema.registry.url property is required.

How to provide schema with each record using Avro and not use Schema registry?

String schemaString = "json"; Schema.Parser parser = new Schema.Parser(); Schema schema = parser.parse(schemaString); GenericRecord student = new GenericRecord(schema); student.put("id", studentId); student.put("name", name); ProducerRecord data = new ProducerRecord<>("TOPIC", student.getName(), student); producer.send(data);

What is the problem with addition of new partitions?

Suppose application is running and there are four partitions. Some of the records might be present in those partitions. When new partition is added, some old data is moved to new partition. New records will get written to a different partition. Now this will cause problem because even the messages with same key will be in different partitions. So if partitioning keys is important, create partitions before hand and never add partitions. This is similar to the concept of logical partitioning in database, you plan ahead and create more than require partitions.

Problem with default partitioning in Kafka?

Imbalanced partitions. More data may get written to same partition, resulting in one partition being twice as large as first partition. This can cause slow in speed and out of space issues.

How can be override the default partitioning strategy provided by Kafka?

We can implement Partitioner interface and override method partition().

Is there a limit to the number of partitions that can be created with Kafka?

No there is no upper limit.

Kafka Producer Flashcards

(39 cards)