Kafka series---【Working principle of Kafka】

How Kafka works

common concept

  • Broker: It can be understood as a node and a kafka service.
  • Topic: It can be understood as a table, and a topic can have multiple Partition partitions.
  • Partition (partition): It can solve the problem that a topic message is too large. For example, there are 1T messages now. With the partition, the 1T message can be split into multiple parts, for example, split into two 512G messages and stored in two partitions.
  • Replicas (copy): Create multiple backups for a partition in the topic. Multiple backups are placed in multiple brokers in the Kafka cluster. There will be one copy as the leader and the others as followers.
  • leader: Responsible for reading and writing of Kafka, and responsible for synchronizing data to followers. When the leader hangs up, after master-slave election, a new leader is elected from multiple follwers of ISR.
  • isr: The nodes that can be synchronized and the nodes that have been synchronized will be stored in the ISR set. Note: If the performance of the node in the isr is poor, it will be kicked out of the isr set.

consumption principle

  • A partition can only be consumed by one consumer in a consumer group to ensure the order of consumption. However, the total order of consumption by multiple consumers of multiple partitions cannot be guaranteed, so how to achieve the total order of consumption?
  • The number of partitions determines the number of consumers in a consumption group. It is recommended that the number of consumers in the same consumption group should not exceed the number of partitions, otherwise too many consumers will not be able to consume messages.
  • If the consumer hangs up, the rebalance mechanism will be triggered, allowing other consumers to consume the partition.

Producer's ack parameter configuration


In the scenario of sending messages synchronously: after the producer sends to the broker, the ack will have 3 different options:

  • 1. acks=0: It means that the producer does not need to wait for any broker confirmation to return ack directly, and then it can continue to send the next message. The highest performance, but the easiest to lose messages.
  • 2. acks=1: Wait for the leader to successfully write the data locally data/log, but do not need to wait for all the followers to write successfully. You can continue to send the next message. In this case, if the follower fails to back up the data successfully, and the leader hangs up at this time, the message will be lost. The performance and security are relatively balanced, and it is recommended to use.
  • 3. acks=-1 or all: You need to wait for min.insync.replicas (the default is 1, and the recommended configuration is greater than or equal to 2). The number of replicas configured by this parameter is successfully written to the log. This strategy will ensure that no data will be lost as long as one backup survives. This is the strongest data guarantee. Generally, this configuration is only used unless it is a financial level, or a scene dealing with money.
    If sending fails, retry_config will retry 3 times by default, and each time retry_backoff_ms_config will retry again at a default interval of 100ms.

Producer buffer mechanism


The sent message will first enter the local buffer buffer_memory_config (default 32mb), and kakfa will run a thread to fill the buffer with batch_size_config (default 16k, 16384) data and send it to kafka. If the linger_ms_config (default 10) milliseconds data is not full 16k, it will also be sent once.

Manual and automatic submissions by consumers

  1. Submitted content
    Whether the consumer submits automatically or manually, he or she needs to submit such information as the consumer group + a certain topic of consumption + a certain partition of consumption and the offset of consumption to the _consumer_offsets topic interface of the cluster.
  2. auto commit
# 设置自动提交参数 - 默认
// 是否自动提交offset,默认就是true
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
// 自动提交offset的间隔时间
props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000");

消费者poll到消息后默认情况下,会自动向broker的_consumer_offsets主题提交当前主题-分区消费的偏移量。
自动提交会丢消息: 因为如果消费者还没消费完poll下来的消息就自动提交了偏移量,那么此 时消费者挂了,于是下一个消费者会从已提交的offset的下一个位置开始消费消息。之前未被消费的消息就丢失掉了。
3) 手动提交

# 设置手动提交参数
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");

Guess you like

Origin blog.csdn.net/weixin_44988127/article/details/131715232