Kafka Knowledge System - Kafka Design and Principle Analysis - Message Passing Semantics

message passing semantics

Message Delivery Guarantee

This section discusses how Kafka ensures that messages are transmitted between producers and consumers. There are three possible delivery guarantees:

  • At most once: messages may be lost, but never repeated
  • At least once: the message is never lost, but may be transmitted repeatedly
  • Exactly once: each message will definitely be transmitted once and only once

Kafka's message transmission guarantee mechanism is very intuitive. When the producer sends a message to the broker, once the message is committed, it will not be lost due to the existence of replication. However, if the producer encounters a network problem after sending data to the broker and the communication is interrupted, the producer cannot determine whether the message has been committed. Although Kafka cannot determine what happened during a network failure, the producer can retry multiple times to ensure that the message has been correctly transmitted to the broker, so Kafka currently implements at least once.

After the consumer reads the message from the broker, it can choose to commit, which will save the offset of the message read by the consumer under the partition in Zookeeper. The next time the consumer reads the partition, it will start reading from the next one. If not committed, the start position of the next read will be the same as the start position after the previous commit. Of course, the consumer can also be set to autocommit, that is, the consumer will automatically commit once the data is read. If only this process of reading messages is discussed, then Kafka ensures exactly once, but if there is a duplication of messages due to some reason between the previous producer and the broker, then this is at least once.

Consider such a situation, when the consumer finishes reading the message, it commits first and then processes the message. In this mode, if the consumer crashes before it has time to process the message after the commit, it will not be able to read the message just after the next restart. Submitted but unprocessed messages, which corresponds to at most once.

After reading the message, process it first and then commit it. In this mode, if the consumer crashes before committing after processing the message, the message that has just not been committed will be processed when the work is restarted next time. In fact, the message has already been processed, which corresponds to at least once.

To achieve exactly once, a message deduplication mechanism needs to be introduced.

message deduplication

As mentioned in the previous section, Kafka will have message duplication on both the producer side and the consumer side, which requires deduplication.

The concept of GUID (Globally Unique Identifier) ​​is mentioned in the Kafka document. The unique id of each message is obtained through the client-side generation algorithm, and it can be mapped to the address stored on the broker. That is, the content of the message can be queried and extracted through the GUID, and it is also easy to send. To ensure the idempotency of the party, this deduplication processing module needs to be provided on the broker, which is not supported in the current version.

For GUIDs, if the deduplication is performed from the perspective of the client, a centralized cache needs to be introduced, which will inevitably increase the complexity of dependencies, and the size of the cache is difficult to define.

Not only Kafka, but also commercial-grade middleware like RabbitMQ and RocketMQ only guarantees at least once, and cannot deduplicate messages from itself. Therefore, we recommend that business parties perform deduplication according to their own business characteristics. For example, the business message itself is idempotent, or it can be deduplicated with the help of other products such as Redis.

High Availability Configuration

Kafka provides high data redundancy elasticity. For scenarios that require high data reliability, we can increase the number of data redundancy backups (replication.factor) and increase the minimum number of write replicas (min.insync.replicas ) and so on, but this will affect performance. On the contrary, the performance is improved and the reliability is reduced, and users need to make some trade-off choices among their own business characteristics.

To ensure that data written to Kafka is safe and highly reliable, the following configurations are required:

  • Topic configuration: replication.factor>=3, that is, the number of replicas is at least 3; 2<= min.insync.replicas<=replication.factor
  • Broker configuration: leader election conditionsunclean.leader.election.enable=false
  • producer configuration: request.required.acks=-1(all),producer.type=sync

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324652827&siteId=291194637