Quickly understand the use and principle of Kafka producers

Author | Grass pinch

Finishing | Yang Biyu

Produced | Grass pinch (ID: chaycao)

Head picture | CSDN download from Visual China

This article will learn about the use and principles of Kafka producer. The version number of kafka-clients used in this article is 2.6.0. Let's enter the text below, let's take an example to see how to use the producer API to send messages.

public class Producer {
    
    public static void main(String[] args) {
        // 1. 配置参数
        Properties properties = new Properties();
        properties.put("bootstrap.servers", "localhost:9092");
        properties.put("key.serializer",
                "org.apache.kafka.common.serialization.StringSerializer");
        properties.put("value.serializer",
                "org.apache.kafka.common.serialization.StringSerializer");
        // 2. 根据参数创建KafkaProducer实例（生产者）
        KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
        // 3. 创建ProducerRecord实例（消息）
        ProducerRecord<String, String> record = new ProducerRecord<>("topic-demo", "hello kafka");
        // 4. 发送消息
        producer.send(record);
        // 5. 关闭生产者示例
        producer.close();
    }
    
}

Three required parameters for configuration

First create a Properties instance and set three required parameters:

bootstrap.servers: a list of broker addresses;
key.serializer: the serializer of the key of the message;
value.serializer: The serializer of the value of the message.

Since the broker expects to accept a byte array, it needs to serialize the key value in the message into a byte array. After setting the parameters, create a KafkaProducer instance based on the parameters, that is, the producer used to send the message, then create the message ProducerRecord instance to be sent, then use the send method of KafkaProducer to send the message, and finally close the producer.

Regarding KafkaProducer, let us remember two points:

When creating an instance, you need to specify the configuration;
The send method can send a message.

send method

Regarding configuration, we only understand these three required parameters. Let’s look at the send method. There are three ways to send messages :

1. Send and forget (fire-and-forget)

When sending a message to Kafka, it does not care whether the message arrives normally, it is only responsible for the successful transmission, and there is a possibility of losing the message. The example given above is this way.

2. Synchronous transmission (sync)

The return value of the send method is a Future object, which will block waiting for Kafka's response when calling its get method. as follows:

Future<RecordMetadata> recordMetadataFuture = producer.send(record);
RecordMetadata recordMetadata = recordMetadataFuture.get();

The RecordMetadata object contains some metadata of the message, such as the subject of the message, the partition number, the offset in the partition, and the timestamp.

3. Asynchronous sending (async)

When calling the send method, specify the callback function, which will be called when Kafka returns a response. as follows:

producer.send(record, new Callback() {
    @Override
    public void onCompletion(RecordMetadata recordMetadata, Exception e) {
        if (e != null) {
            e.printStackTrace();
        } else {
            System.out.println(recordMetadata.topic() + "-"
                               + recordMetadata.partition() + ":" + recordMetadata.offset());
        }
    }
});

onCompletion has two parameters, the types of which are RecordMetadata and Exception. When the message is sent successfully, recordMetadata will be non-null and e will be null. When the message fails to be sent, the opposite is true.

Message object ProducerRecord

Below we recognize the message object ProducerRecord, which encapsulates the sent message, and its definition is as follows:

public class ProducerRecord<K, V> {
    private final String topic;  // 主题
    private final Integer partition;  // 分区号
    private final Headers headers;  // 消息头部
    private final K key;  // 键
    private final V value;  // 值
    private final Long timestamp;  // 时间戳
    // ...其他构造方法和成员方法
}

The subject and value are required, and the rest are not required. For example, when the partition number is given, it is equivalent to specifying the partition, and when the partition number is not given, if the key is given, it can be used to calculate the partition number. Regarding the message header and timestamp, I will not talk about it.

Components used when sending messages

After understanding the producer object KafkaProducer and the message object ProducerRecord, let's look at the following components when using the producer to send messages, the producer interceptor, serializer, and partitioner. Its structure (parts) is as follows:

1. Producer Interceptor: The ProducerInterceptor interface is mainly used to do some preparatory work before the message is sent, such as filtering the message, or modifying the content of the message. It can also be used to make some customized requirements before sending the callback logic, for example Statistical work.

2. Serializer , Serializer interface, used to convert data into byte arrays.

3. Partitioner, Partitioner interface, if the partition number is not specified, and the key is provided.

Process

Let's take a look at the processing process in combination with the code to deepen the impression.

public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {
    // 拦截器，拦截消息进行处理
    ProducerRecord<K, V> interceptedRecord = this.interceptors.onSend(record);
    return doSend(interceptedRecord, callback);
}

The above is the send method of KafkaProducer. First, the message is passed to the interceptor's onSend method, and then the doSend method is entered. Among them, the doSend method is longer, but the content is not complicated. The main steps are explained below.

private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
    TopicPartition tp = null;
    try {
        throwIfProducerClosed();
        // 1.确认数据发送到的topic的metadata可用
        long nowMs = time.milliseconds();
        ClusterAndWaitTime clusterAndWaitTime;
        try {
            clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), nowMs, maxBlockTimeMs);
        } catch (KafkaException e) {
            if (metadata.isClosed())
                throw new KafkaException("Producer closed while send in progress", e);
            throw e;
        }
        nowMs += clusterAndWaitTime.waitedOnMetadataMs;
        long remainingWaitMs = Math.max(0, maxBlockTimeMs - clusterAndWaitTime.waitedOnMetadataMs);
        Cluster cluster = clusterAndWaitTime.cluster;
        // 2.序列化器，序列化消息的key和value
        byte[] serializedKey;
        try {
            serializedKey = keySerializer.serialize(record.topic(), record.headers(), record.key());
        } catch (ClassCastException cce) {
            throw new SerializationException("Can't convert key of class " + record.key().getClass().getName() +
                                             " to class " + producerConfig.getClass(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).getName() +
                                             " specified in key.serializer", cce);
        }
        byte[] serializedValue;
        try {
            serializedValue = valueSerializer.serialize(record.topic(), record.headers(), record.value());
        } catch (ClassCastException cce) {
            throw new SerializationException("Can't convert value of class " + record.value().getClass().getName() +
                                             " to class " + producerConfig.getClass(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).getName() +
                                             " specified in value.serializer", cce);
        }
        // 3.分区器，获取或计算分区号
        int partition = partition(record, serializedKey, serializedValue, cluster);
        tp = new TopicPartition(record.topic(), partition);

        setReadOnly(record.headers());
        Header[] headers = record.headers().toArray();

        int serializedSize = AbstractRecords.estimateSizeInBytesUpperBound(apiVersions.maxUsableProduceMagic(),
                                                                           compressionType, serializedKey, serializedValue, headers);
        ensureValidRecordSize(serializedSize);
        long timestamp = record.timestamp() == null ? nowMs : record.timestamp();
        if (log.isTraceEnabled()) {
            log.trace("Attempting to append record {} with callback {} to topic {} partition {}", record, callback, record.topic(), partition);
        }
        Callback interceptCallback = new InterceptorCallback<>(callback, this.interceptors, tp);

        if (transactionManager != null && transactionManager.isTransactional()) {
            transactionManager.failIfNotReadyForSend();
        }
        // 4.消息累加器，缓存消息
        RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey,
                                                                         serializedValue, headers, interceptCallback, remainingWaitMs, true, nowMs);

        if (result.abortForNewBatch) {
            int prevPartition = partition;
            partitioner.onNewBatch(record.topic(), cluster, prevPartition);
            partition = partition(record, serializedKey, serializedValue, cluster);
            tp = new TopicPartition(record.topic(), partition);
            if (log.isTraceEnabled()) {
                log.trace("Retrying append due to new batch creation for topic {} partition {}. The old partition was {}", record.topic(), partition, prevPartition);
            }
            // producer callback will make sure to call both 'callback' and interceptor callback
            interceptCallback = new InterceptorCallback<>(callback, this.interceptors, tp);

            result = accumulator.append(tp, timestamp, serializedKey,
                                        serializedValue, headers, interceptCallback, remainingWaitMs, false, nowMs);
        }

        if (transactionManager != null && transactionManager.isTransactional())
            transactionManager.maybeAddPartitionToTransaction(tp);

        // 5.如果batch满了或者消息大小超过了batch的剩余空间需要创建新的batch
        // 将唤醒sender线程发送消息
        if (result.batchIsFull || result.newBatchCreated) {
            log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);
            this.sender.wakeup();
        }
        return result.future;
    } catch (ApiException e) {
        log.debug("Exception occurred during message send:", e);
        if (callback != null)
            callback.onCompletion(null, e);
        this.errors.record();
        this.interceptors.onSendError(record, tp, e);
        return new FutureFailure(e);
    } catch (InterruptedException e) {
        this.errors.record();
        this.interceptors.onSendError(record, tp, e);
        throw new InterruptException(e);
    } catch (KafkaException e) {
        this.errors.record();
        this.interceptors.onSendError(record, tp, e);
        throw e;
    } catch (Exception e) {
        this.interceptors.onSendError(record, tp, e);
        throw e;
    }
}

doSend method

The doSend method is mainly divided into 5 steps :

Before sending data, confirm that the metadata of the topic to which the data is sent is available (the leader of the partition is available, if permission control is turned on, the client is also required to have corresponding permissions);
Serializer, serialize the key and value of the message;
Partitioner, obtain or calculate the partition number;
Message accumulator, buffer messages;
In the message accumulator, messages will be placed in a batch for batch sending. When the batch is full or the message size exceeds the remaining space of the batch and a new batch needs to be created, the sender thread will be awakened to send the message.

This article will not go into more details about meatadata, the serializer and partitioner are also introduced in the previous article. Below we mainly look at the message accumulator.

Message accumulator

The message accumulator is used to buffer messages in order to send messages in batches. Use a ConcurrentMap <TopicPartition, Deque <ProducerBatch >> batches map variable to save the message in RecordAccumulator. The TopicPartition as the key encapsulates the topic and the partition number, and the corresponding value is the double-ended queue of the ProducerBatch, that is, messages sent to the same partition are cached in the ProducerBatch. When sending a message, the Record will be appended to the tail of the queue, that is, added to the ProducerBatch at the end. If the space of the ProducerBatch is insufficient or the queue is empty, a new ProducerBatch will be created and then appended. When the ProducerBatch is full or a new ProducerBatch is created, the Sender thread will be awakened to get the ProducerBatch from the head of the queue and send it.

RecordAccumulator

In the Sender thread, the ProducerBatch to be sent will be converted into the form of <Integer, List <ProducerBatch >>, grouped by the ID of the Kafka node, and then the ProducerBatch of the same node is sent in one request.

The content of the Kafak producer first understands this, the following is a brief review of the content of this article through a mind map:

reference

"In-depth understanding of Kafka core design and practical principles"
The Definitive Guide to Kafka
Producer sending model of Kafka source code analysis (1): http://matt33.com/2017/06/25/kafka-producer-send-module/

更多精彩推荐
☞谷歌软件工程师薪资百万，大厂薪资有多高？
☞CSDN 创始人蒋涛：选择长沙作“大本营”，打造开发者中心城市
☞杜甫在线演唱《奇迹再现》、兵马俑真人还原……用AI技术打破次元壁的大谷来参加腾讯全球数字生态大会啦！
☞开放源码，华为鸿蒙HarmonyOS 2.0来了
☞20张图，带你搞懂高并发中的线程与线程池！
☞跨链，该怎么跨？

点分享点点赞点在看

Quickly understand the use and principle of Kafka producers

Guess you like