Kafka time wheel algorithm

Preface

There are a lot of delayed operations in Kafka.

  1. Sending a message - timeout + delay of the retry mechanism.
  2. ACKS acknowledgment mechanism delay.

Kafka does not use the Timer or DelayQueue that comes with JDK to implement the delay function. Instead, it customizes a timer (SystemTimer) based on the time wheel to implement the delay function.

The average time complexity of JDK's Timer and DelayQueue insertion and deletion operations is O(log(n)), which cannot meet the high-performance requirements of Kafka. Based on the time wheel, the time complexity of both insertion and deletion operations can be reduced to O. (1).

The application of time wheels is not unique to Kafka. There are many other application scenarios. Time wheels are found in components such as Netty, Akka, Quartz, Zookeeper, etc.

Java task scheduling

Assume that there are 1,000 tasks, all executed at different times, and the time is accurate to seconds. How do you schedule all tasks?

The first idea is to start a thread, traverse all tasks every second, find the execution time that matches the current time, and execute it. If the number of tasks is too large, it will be time-consuming to traverse and compare all tasks.

The second idea is to sort these tasks and put those with the closest execution time (triggered first) first. This will involve a lot of element movement (new tasks, task execution – deletion of tasks, etc., all need to be reordered)

Timer

The JDK package comes with a Timer tool class (under the java.util package), which can implement delayed tasks (for example, triggering after 30 minutes) or periodic tasks (for example, triggering every hour).

Its essence is a priority queue (TaskQueue) and a thread (TimerThread) that executes tasks.

An ordinary queue is a first-in-first-out data structure, with elements added at the end of the queue and deleted from the head. In a priority queue, elements are given priority. When elements are accessed, the element with the highest priority is removed first. The priority queue has the behavioral characteristics of first in, largest out. Usually implemented using a heap data structure.

image.png

image.png

In this priority queue, the task that needs to be executed first is ranked first in the priority queue. Then TimerThread continuously compares the execution time of the first task with the current time. If the time is up, first check whether the task is a periodically executed task. If so, modify the current task time to the next execution time. If it is not a periodic task, remove the task from the priority queue. Finally perform the task.

However, Timer is single-threaded and cannot meet business needs in many scenarios.

After JDK1.5, a task scheduling tool that supports multi-threading, ScheduledThreadPoolExecutor, was introduced to replace TImer. It is one of several commonly used thread pools. It contains a delay queue DelayedWorkQueue and a priority queue.

image.png

Minimum heap implementation of DelayedWorkQueue

The priority queue uses a minimum heap implementation.

The meaning of min-heap: a complete binary tree, the value of the parent node is less than or equal to its left child node and right child node

For example, insert the following data [1,2,3,7,17,19,25,36,100]

The minimum heap looks like this.

image.png

The time complexity of insertion and deletion in the priority queue is O(logn). When the amount of data is large, the performance of frequent heap loading and unloading is not very good.

For example, to insert 0, the process is as follows:

  1. Insert last element
    image.png

  2. 0 is smaller than 19, so move it up and swap.
    image.png

  3. 0 is smaller than 2, so move it up and swap.
    image.png

4. 0 is smaller than 2, so it needs to be moved up and swapped.

image.png

Algorithmic complexity

A minimum heap of N data has a total of logN levels. In the worst case, it needs to be moved logN times.

time wheel

The time wheel first considers grouping all tasks and putting tasks with the same execution time together. For example, in the picture below, a subscript in the array represents 1 second. It will become a data structure of an array plus a linked list. After grouping, the traversal and comparison time will be reduced.

image.png

But there is still a problem. If the number of tasks is very large and the times are different, or there are tasks whose execution time is very far away, does the array length need to be very long? For example, there is a task that will be executed in 2 months. Calculated from now on, its subscript is 5253120.

So the length definitely cannot be infinite, it can only be a fixed length. For example, the fixed length is 8, one grid represents 1 second (now called a bucket slot), and one circle can represent 8 seconds. The traversing thread only needs to obtain the task one grid at a time and execute it, and it will be OK.

How can a fixed-length array be used to represent time beyond the maximum length? You can use a loop array.

For example, a loop array with a length of 8 can represent 8 seconds. How to put the tasks executed after 8 seconds? Just divide by 8, use the remainder, and put it in the corresponding grid. For example, 10%8=2, it is placed in the second grid. Here comes the concept of rounds, and the task at the 10th second is only executed in the second round.

image.png

At this time, the concept of time wheel has come out.

If there are too many tasks and many tasks are executed at the same time, the linked list will become very long. Here we can further transform this time wheel and make a multi-layered time wheel.

For example: there are 8 grids in the innermost layer, and each grid is 1 second; there are 8 grids in the outer layer, and each grid is 8*8=64 seconds; the innermost layer moves once, and the outer layer moves once. At this time, the time wheel looks more like a clock. As time goes by, tasks will be degraded, and tasks in the outer layer will slowly move to the inner layer.

image.png

The time complexity of time wheel task insertion and deletion is O(1). It has a very wide range of applications and is more suitable for delay scenarios with a large number of tasks. It is implemented in Dubbo, Netty, and Kafka.

Time wheel implementation in Kafka

TimingWheel data structure in Kafka

image.png

Kafka will start a thread to push the pointer of the time wheel to rotate. The implementation principle is actually to take out the TimerTaskList placed in the front slot through queue.poll()

image.png

image.png

Add new deferred task

image.png

Add new tasks to the time wheel

image.png

The advancement of the time wheel hand

image.png

The code for creating the second layer of time wheel is as follows

image.png

Guess you like

Origin blog.csdn.net/qq_28314431/article/details/133075920