The difference between AR, ISR, OSR, HW, and LEO in kafka

The relationship between AR, ISR, OSR and HW, LEO in kafka

Kafka introduces a multi-copy (Replica) mechanism for partitions, and the disaster recovery capability can be improved by increasing the number of copies. The same message is saved in different copies of the same partition (at the same time, the copies are not exactly the same). The
relationship between the copies is "one master and many slaves", in which the leader copy is responsible for processing read and write requests.The follower copy is only responsible for message synchronization with the leader copy. The copies are in different brokers, when the leader copy fails,Re-elect a new leader from the follower copy to provide external services. Kafka realizes the automatic transfer of failures through the multi-copy mechanism, and when a broker in the Kafka cluster fails, the service can still be guaranteed to be available

AR、ISR、OSR

In Kafka, producers and consumers only interact with the leader copy , and the follower copy is only responsible for message synchronization. In many cases, the messages in the follower copy will lag behind the leader copy. According to different synchronization situations, Kafka divides the replicas into the following collections:

  • AR (Assigned Replicas) : All replicas in the partition are collectively referred to as AR

  • ISR(On-Sync Replicas)All replicas that are somewhat in sync with the leader replica(including the leader copy) to form an ISR

  • OSR (Out-of-Sync Replicas )Synchronize lagging replicas with leader replicas(not including the leader copy) composition

AR = ISR + OSR, by default, when the leader copy fails, only the copy in the ISR set is eligible to be elected as the new leader

Under normal circumstances, the follower copy should be kept with the leader copyTo a certain extentSynchronization of , ie AR=ISR, the OSR set is empty.

ISR and OSR are not fixed :

  • The leader copy is responsible for maintaining and tracking the lagging status of all followers in the ISR set. When the follower copy lags behind too much or fails, the leader copy will remove it from the ISR set
  • If there is a follower copy in the OSR set that "catches up" with the leader copy, then the leader copy it transfers from the OSR set to the ISR set

HW、LEO

In addition, ISR has a close relationship with HW and LEO .

  • HW : HW is the abbreviation of High Watermark , commonly known as high water mark , which identifies a specific message offset (offset),Consumers can only fetch messages before this offset.

    As shown in the figure below, consumers can only consume data at offsets 0 to 5

insert image description here

  • LEO : LEO is the abbreviation of Log End Offset , which identifies the current log filenext message to be written offset, as shown in the log above, the data number 0~8 has been written, then the position where the offset is 9 is the LEO of the current log file.

Weigh the relationship between data reliability and performance through ISR and HW

Let's use an example to specifically feel this kind of ISR collection:

Assuming that there are 3 copies in the ISR of a certain partition, then both HW and LEO are 3 (that is, only messages before 3 can be consumed), and now messages 3 and 4 are sent from the producer and stored in the leader copy

insert image description here

After the message is written, the follower copy will pull the message for message synchronization

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-ngfCRPTF-1660468449500) (D:\note\note warehouse\picture\image-20220814170354438.png)]

At some point, only follower1 may be synchronized successfully, and follower2 only synchronizes message 3 but not message 4. Then the HW at this time is 4, and the LEO is 5. At this point, consumers can consume data whose offset is between 0 and 3

insert image description here

After all replicas are successfully synchronized, both HW and LEO become 5, and consumers can consume messages with offset 4

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-3XTQL2z8-1660468449503) (D:\note\note warehouse\picture\image-20220814170803251.png)]

Summary : Kafka is effective through this ISR methodWeighed the relationship between data reliability and performance

Reference for this article: "In-depth understanding of Kafka core design and practice principles"

Guess you like

Origin blog.csdn.net/qq_53578500/article/details/126333814