Section 2 : Producer Flashcards

1
Q

What are the Kafka Producer Configurations: Bootstrap.servers ?

A

There are 3 mandatory configurations properties:

  • bootstrap.servers
    • list of host:port pairs of brokers
    • No need to include all brokers
    • Recommended to include at least two, in case one broker goes down
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the Kafka Producer Configurations: key.serializer ?

A
  • Name of a class that will be used to serialize the keys of the records
  • Brokers expect byte arrays as keys and values of Kafka messages
  • Setting key.serializer is required even if you intend to send only values

NB: To implement your own serializer, you should implement org.apache.kafka.common.serialiser interface. Producers will use this class to serialize the key object to a byte array.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the Kafka Producer Configurations: value.serialize ?

A
  • name of a class that will be used to serialize the values of the records
  • Set value.serializer to a class that will serialize the message value object same way as you need to set for key.serializer to a name of a class that will serialize the message key object.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the Kafka Producers - Optional Configuration?

A

These parameters may impact on memory use, performance and reliability of the producers.

  • acks
    • means a messages must be acknowledged
    • acks=0 High Throughput/ Less Reliable (Producer will not wait for a reply from the broker and will assume that the message was sent successfully)
    • acks=1 Medium Throughput / Medium Reliable (Producer will receive a success response fro mthe broker when leader replica received the message, if message can’t be written to the leader, the producer will receive an error response. It can retry. sendingthe message, avoiding potential loss of data)
    • acks=all Safest Mode/ High Latency (producer will receive a success responce from the broker once all in-sync replicas received the message, Message will survive even in the case of a crash)
  • buffer.memory
    • Sets the amount of memory the producer will use to buffer messages waiting to be sent to brokers.
    • If messages are sent faster than they are delivered, producer may run out of space.
  • compression.type
    • To compress the data before sending it to the brokers.
    • compression types: Snappy, Gzip or lz4
      • Snappy:
        • Decent compression ratios
        • Low CPU overhead
        • Good performance
        • Use: when both performance and bandwidth are concern.
      • Gzip:
        • Uses more CPU and time
        • Better compression ratios
        • Use: when netowrk bandwidth is more restricted
      • lz4
        • Lossless data compression algorithm
        • Focused on compression and decompression speed.
  • batch.size
    • Controls the amount of memory in bytes (not messages!) that will be used for each batch.
    • When the batch is full, all the messages in the batch will be sent [Producer will not wait for batches to be filled]
    • Batch size too small - More overhead
    • Btach size too big - More memory consumption but no impact on latency.
  • linger.ms
    • Controls the amount of time to wait for additional messages before sending the current batch.
    • Kafka producer sends a batch of messages either when the current btach is full or when the linger.ms limit is reached.
    • Default value: 0
  • retries
    • Controls how many times the producer will retry sending the the message before giving up [Default value: 3]
    • By default, the producer will wait 100ms between retries[can be modified using retry.backoff.ms]
  • client.id
    • Used by the brokers to identify messages sent from the client
    • Used in logging, metrics and for quotas
      • max.in.flight.requests.per.connection
    • Controls how many messages the producer will send to the server without receiving responses.
    • High Value:
      • Increases memory usage
      • Improves throughput [setting it too high can reduce throughput as batching becomes less efficient]
    • Setting this to 1 will guarantee that messages will be written to the broker in the order in which they were sent, even when retries occur.
  • timeout.ms
    • Controls the time the broker will wait for in-sync replicas to acknowledge the message in order to meet the acks configuration
  • request.timeout.ms
    • Controls how long the producer will wait for a reply from the server when sending data
  • metadata.fetch.timeout.ms
    • Controls how long the producer will wait for a reply from a server when sending data
  • max.block.ms
    • Controls how long the producer will block when calling send() and when explicitly requesting the metadata via partitionsFor()
    • When max.block.ms is reached, a timeout exception is thrown.
  • max.request.size
    • Controls the size of a produce request sent by the prodcuer
    • Caps the size of the largest message that can be sent
    • Caps the number of messages that the producer can send in one request
  • receive.buffer.bytes and send.buffer.bytes
    • Sizes of the TCP send and receive buffers used by the sockets when writing and reading data.
    • If these are set to -1, the OS defaults will be used.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does Kafka Send messages?

A

Once producer has been created, message needs to be sent. ways of sending messages:

  • Fire-and-forget
    • Send the message. without bothering about successful delivery
    • if configured, producer will retry sending messages automatically
    • As Kafka is highly available, most of the time, messages will arrive successfully but some messages may get lost using this method.
    • Use case:
      • Click impression data
      • Monitoring events data
      • Not fit for production application
  • Synchronous send
    • Send message and wait for acknowledgement
    • Use case:
      • Dealing with critical data
      • Customer orders
      • Banking Transactions
  • Asynchronous send
    • Send message but does not wait for response, a callback method is invoked with a response
    • Use Case:
      • Used when dealing with critical data
      • Customer Orders
      • Banking transactions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the Kafka Common Errors?

A

Retriable Error:

  • Can be resolved by retrying (i.e. by sneding the message again)
  • Example: Connection error
  • Kafka producer can be configured to try on these errors automatically
  • In this case, application will get exceptions only when the number of retries was exhausted

Non-Retriable Error:

  • Cannot be resolved by retrying
  • Example: “message size too long” error
  • Kafka producer will not attempt to rety and will return the exception immediately
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are Kafka Partitions?

A
  • If key is null and default partitioner is used, the record is sent to one of the available partitions of the topic at random
  • A round-robin algorithm will be used to balance the messages among the partitions.
  • Kafka by default uses key’s hash to map to partition.
    • target partition = Utils.abs(Utils.murmur2(recordkey())) % numPartitions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly