Analysis of Linux TC (Traffic Control) framework principle (reproduced)

-------------------------------------------------------------------------------------------------------------------------------------------------------------------

I have used TC for a while for flow control, packet loss, and other functions. It is really good.

However, the cumbersome command line and principle are still very uncomfortable. But the function is powerful, this article describes it well, I hope to help you

-------------------------------------------------------------------------------------------------------------------------------------------------------------------

Analysis of Linux TC (Traffic Control) framework principle (reproduced)

The recent work is more or less related to the flow control of Linux. Since I knew that there is such a thing as TC a few years ago and understood its principle more or less, I have not moved it again because I don’t like TC commands. Yes, it is too cumbersome. The iptables command line is also more cumbersome, but more intuitive than the TC command line, which is too technical. Maybe I don't have a deep understanding of the TC framework and the Netfilter framework, maybe it is. iptables/Netfilter corresponds to tc/TC.
       The Linux kernel has a built-in Traffic Control framework, which can realize traffic rate limiting, traffic shaping, and policy application (discarding, NAT, etc.). Can you think of anything else from this framework? Maybe not now, but I will briefly talk about it first. The Netfilter framework is similar to the TC framework, but the two are quite different.
       After being proficient in the Netfilter framework, it will be much simpler to experience the TC framework. Especially, when you think that Netfilter has such limitations, take these questions to experience the design of the TC framework, you may find that TC is in some To make up for the deficiency of Netfilter. Before going into the details, let me introduce the similarities between the two and the big differences in design due to their different original intentions.
       Let’s talk about Netfilter first. Undoubtedly, this framework is designed to filter data packets on the kernel path of the network protocol stack. Just like a checkpoint on a road, Netfilter sets 5 locations on the path of the protocol stack processing network data packets. The checkpoints of a data packet pass through these checkpoints on the path to be processed, and the result is a number of actions: accept, discard, queue, import other paths, etc. The framework only needs to get a result for a data packet. What services are provided internally is not regulated in the Netfilter framework.
       Now we look at TC. It aims to provide a service for data packets or data streams, such as rate limiting, shaping, etc. This is not a result similar to Netfilter can express. To provide these services requires a series of actions, so How to "plan and organize the execution of these actions" is the key to the TC framework design! In other words, the TC framework focuses on how to execute rather than just want to get an action to be executed. In other words, what is the key to the Netfilter framework, while the TC framework focuses on how to do it. (I have written a lot of code and articles about Netfilter, so I won’t go into details...)
       There are many theories about rate limiting and traffic shaping. The more common ones are the use of token buckets, but this article focuses on Linux The implementation of the TC framework is not related to the token bucket algorithm. However, it is impossible to describe in detail the history from the flow control theory to the implementation of various operating system versions in a short article, but we know that the use of queues is in most implementations The actual choice, now the question is, how does the Linux TC framework organize the queue. Before discussing queue organization in detail, I will compare Netfilter and TC for the last time.
       If you know the difference between UNIX character devices and block devices, it is easier to understand the difference between the Netfilter framework and the TC framework. A HOOK point of Netfilter is similar to a pipe character device, and skb is the unidirectional character stream in this device. Generally, it flows in from one end, and then flows out from the other end in the order of entry, with a result, such as ACCEPT, DROP, etc. . The TC framework is similar to a block device, which performs random storage and random access to the content, that is, the order in which skb enters is not necessarily the order in which skb comes out, and this is what traffic shaping needs to do. In other words, the TC framework must implement a random access packet storage buffer, and flow control in this buffer. Of course, we already know that this is implemented by the queue.
       Of course, nothing is absolute. A HOOK point in Netfilter can also have a storage buffer or perform a series of actions. The typical example is the fragment reorganization and NAT function in conntrack. For the fragment reorganization of the HOOK point of PREROUTING, Undoubtedly, for fragments, just enter the HOOK and temporarily store it in it. Until all fragments come and reassemble successfully, the hook point will flow out at one time. For NAT, the netfilter processing result is undoubtedly "executed." A series of actions" not just ACCEPT. In addition, I have also written some modules that use Netfilter to implement flow control. Conversely, the TC framework can also implement the functions of Netfilter. In short, when you understand the design principles and nature of these frameworks, in the use and expansion, You can lay down the cows, and you can do it with ease.
       I personally feel that for a single Netfilter HOOK point, the TC framework is a superset of it, which is more flexible in implementation and of course more complicated. The charm that Netfilter does not possess lies in the definition of its HOOK point location.
       Well, now we will officially introduce the design of the TC framework.
       When introducing TC from a lot of information found on the Internet, without exception, TC is composed of "queue procedures, categories, and filters". Most of them are ambiguous. I dare to say that these are all from an article. Document or a book. Few people understand the design of the TC framework from another angle, and this in itself is a more challenging thing. I personally like this kind of thing. Before introducing the queue organization of TC, let me first introduce what is called recursive control. The so-called recursive control means hierarchical control, and the control method is the same for each level. Those who are familiar with CFS scheduling know that the same scheduling method is used for group scheduling and task scheduling. However, obviously groups and tasks belong to different levels. I drew the following picture to briefly describe this situation:

It is not only the organization of control logic, but even Linux uses this tree-shaped recursive control logic when implementing the UNIX process model. Each level is a two-level tree. The following figure shows this model:

It can be seen that recursive control is fractal. It would be better if it can be displayed in a three-dimensional graph. For the above figure, every node except the leaf node is an independent small tree, whether it is a big tree or Small trees have exactly the same nature to control logic or organizational logic.
       Recursive control facilitates arbitrary superposition of control logic. We have seen this in the design of protocol stacks, such as X over Y, or XoY for short, such as PPPoE, IP over UDP (OpenVPN in tun mode), and TCP over IP (native TCP). /IP stack)...For TC, consider the following requirements:
1. Divide the entire bandwidth into TCP and UDP in a ratio of 2:3;
2. In TCP traffic, divide it according to the source IP address segment Different priorities;
3. In the same priority queue, bandwidth is allocated to HTTP applications and others in a ratio of 2:8;
4....

From the above requirements, it can be seen that this is a recursive control requirement. Among them, 1 and 3 both use bandwidth proportional allocation, but it is obvious that this belongs to different levels. The entire architecture should look like this:

But things are far from being as simple as imagined. Although the picture above has allowed you to see the clues of the TC framework, it does not help to realize it. There are a few typical questions. How do you distinguish data packets into different queues? What data structure should the non-leaf nodes in the graph represent? Since they are not real queues but have queue behavior, how to express them? ... When
       Linux implements TC, it abstracts the "queue". Basically it maintains two callback function pointers, one is the enqueue enqueue operation and the other is the dequeue dequeue operation. Regardless of whether it is enqueue or dequeue, it does not necessarily enqueue data packets, but merely "performs a series of operations." This "execute a series of operations" can be:
1. For the leaf node, actually queue it into a real queue or pull a packet from the real queue;
2. Recursively call the enqueue/dequeue of other abstract queues.

Note that in point 2 above, "other abstract queues" are mentioned, so how to locate this abstract queue? This requires a choice, that is, a selector, which puts the data packet into an abstract queue according to the characteristics of the data packet. At this time, the design block diagram of TC can be expressed by the following figure:

As you can see, I did not use the classic "queue procedure, category, filter" triple to define the TC framework, but used a recursive control meaning to explain it. If you use classic triples to fit this picture, it will look like the following. Note that I deleted unnecessary text so that the picture will not be too confusing. If you need text, please refer to the above picture:

It can be seen that all changes are inseparable from the heart or the heroes see the same.
       Okay, let’s talk a little off topic now. It is still related to Netfilter. Of course, it is not a comparison between it and TC, but a little bit of my own thoughts. Once upon a time, I admired Cisco ACLs very much because they are applied to the network card interface, while Netfilter intercepts the processing path instead of the processing device. For Netfilter, the processing device is just a match without special features. , Regardless of whether it is related or not, all data packets have to go through the Netfilter HOOK point of choice, at least you have to judge whether it matches -i ethX... I want to hang a filter_list on net_device, and I have written some code and found that the effect is relatively Okay, ready to adopt. I am a person who reinvents the wheel often. When I later saw the implementation of TC, I found that the TC framework was exactly what I wanted to find, so I declared that it can be achieved with Netfilter and the same can be achieved with TC. Moreover, TC is based on the queue discipline (this is how the data structure field is written, Qdisc-queue discipline, which is not affected by the classic triple expression), and the abstract enqueue/dequeue does not specify how to implement it, and the queue discipline Bind with the network card (more precisely the queue of the network card-if the network card supports multiple queues) instead of intercepting it in the processing path. So I have two choices:
1. Implement a new Qdisc, which has a simple FIFO queue built in. The enqueue operation is performed on the matches/target transplanted from Netfilter, and all ACCEPT packets are discharged into the FIFO;
2. On the classifier A fuss, whether to classify the data packet into a category not only depends on the characteristics of the data packet, but also an additional action callback function. Only if the function returns 0, it means success. Since it is a callback, you can perform any action( drop, nat, etc.), close the door and lualu.

Of the above 1 and 2, point 2 has been implemented. The first point is easy to implement. You only need to implement a queue procedure, or add an action to each queue procedure, which looks like the following figure:

For the second point, it is relatively simple. Its essence is to make a fuss in that diamond. The enlarged diamond is as shown in the figure below:

In this way, the firewall function and the NAT function are realized with the TC framework, which is my long-time wish. In fact, I have known this for a long time, but I don’t like the TC command, because it is too technical to configure and extremely difficult to maintain, even more difficult than iptables rules to maintain, and maintenance is super important, it even This rule is more important than you think of how to write, because how to write is a momentary matter. If you have enough accumulation, then you can solve it in a moment. If you encounter a problem, dare to say that the appearance of inspiration is also a moment. For example, after drinking, maintenance is a long-term thing, and the person who maintains it is not necessarily yourself. You have to consider others, because the technological society is an altruistic society.
       Okay, so far, I believe I have already said all that should be said. They are all frameworks and there are no details in them. Although I don’t like the TC command line, I still hope to use a picture to show each The relationship between a TC command and the kernel data structure is still not detailed, the command is not complete, match is omitted, because I know those are not important:

Looking at my article, you may find it difficult to get the kind of things that can be used directly after copying and pasting. The code is omitted, and the command is omitted. Even if I am myself, when I see what I wrote many years ago, I really think Run something fast, but there is no such thing. But I feel that thinking is greater than realization. If you understand the essence behind realization or reality, then you will be handy and comfortable.

Guess you like

Origin blog.csdn.net/freeabc/article/details/109072763