100 lines of C code terminal print tree structure

Before getting into the routine, let’s answer three questions first.

Why print a tree structure

Tree structure is a very common data structure in algorithms, from binary tree to multi-fork tree, there are many variants. A lot of work involving algorithms requires programmers to manually implement the tree structure, but due to the complexity of the structure itself, it is not easy to do it right, and a debugging tool is needed to detect the correctness. The general debugging methods are nothing more than adding printing, breaking points on GDB, writing test cases, etc., but these local and external debugging information provide very limited help for the overall grasp of the data structure, and inexperienced programmers may even get lost in one North can't be found in the vast sea of ​​debugging information. Understanding the algorithm itself is one thing, doing it yourself is another, it has to do with the way we understand the algorithm - for data structures, our perception is visual, for example, a picture automatically appears in our minds , dynamic insertion and deletion, how each node changes, how the part rotates when it is balanced, etc. It is not difficult for people with normal intelligence. But for a machine, what it has to face is just a bunch of state-based instructions. Converting human image thinking into a state machine is a difficult job in itself, because it is difficult for us to perceive and store so many states. This requires tools to assist, and it is best to draw the entire shape structure to intuitively remind us where we went wrong.

We know that Linux has a tree command to print a tree-like directory list, which can display all files and subdirectories in a directory at a glance, which is very intuitive. This article can be said to achieve this effect and give the source code implementation.

Why use depth-first traversal

Mainly to facilitate the output. In the terminal output is generally from left to right, from top to bottom, for the tree structure, the former naturally expresses from the root node to the leaf node, the latter naturally expresses adjacent branches, and depth-first traversal is consistent with the output. order.

In fact, breadth-first traversal is simpler to implement, as long as a linked list header is established at the left end of each layer, the nodes of the same layer are connected horizontally, and the linked list header array is traversed from top to bottom. But consider the following:

  • Our screens aren't as wide as a tree, and we tend to scroll vertically;
  • It is difficult to express hierarchical relationships, and it is very troublesome to achieve alignment;
  • Each node needs to maintain an additional next pointer, if this is not a member of the data structure itself, it is an additional burden on storage space.

This also illustrates the second advantage of depth-first traversal, whose implementation is non- .

Why use non-recursive traversal

Actually, this is a matter of opinion. Recursion or non-recursion, but they are two different forms of traversal, there is no absolute pros and cons, and they can complement each other in general. I personally choose to be non-recursive for a few reasons:

  • Avoid the function call stack overflow caused by too many tree levels;
  • Avoid C language function call overhead;
  • All states are visible and controllable.

Of course, the above factors are not important, just be happy.

Everything is a routine, the same should be changed

Since this article pays attention to the routine, then simply give the routine now, in pseudo-code form:

/* log对象 */
typedef struct node_backlog {
    node指针;
    回溯点位置(索引);
};

/* Dump */
void dump(tree) {
    从根节点开始迭代;
    初始化log堆栈;
    for (; ;) {
        if (节点指针为空) {
            从log对象中获取回溯点位置;
            if (不存在,或无效的回溯点) {
                压栈空节点指针;
            } else {
                压栈当前节点指针,同时记录下一个回溯点位置;
            }
            if (回溯点位置索引为0) {
                输出层次缩进、画路径,打印节点内容;
            }
            进入下一层;
        } else {
            if (log堆栈为空) return;
            弹出log对象,获取最近记录的节点指针;
        }
    }
}

Simple right? And I dare say that this routine is common to all tree structures, as long as it can be traversed deeply.

If you don't believe me, I will give three practical examples.

directory tree or dictionary tree

The code is in the gist . This is a MIB tree for managing network nodes (devices). Briefly, it has two properties:

  • The hierarchical nesting relationship between nodes determines that it belongs to the directory hierarchy;
  • A node's key has a common prefix, making it also similar (or usable) to a dictionary structure.

We don't need to care about its CRUD implementation, just know that there is a ready-made directory tree or dictionary tree, how do we output its shape in the terminal.

#define OID_MAX_LEN  64

struct node_backlog {
    /* node to be backlogged */
    struct mib_node *node;
    /* the backtrack point, next to the orignal sub-index of the node, valid when >= 1, invalid == 0 */
    int next_sub_idx;
};

static inline void
nbl_push(struct node_backlog *nbl, struct node_backlog **top, struct node_backlog **bottom) {
    if (*top - *bottom< OID_MAX_LEN) {
        (*(*top)++) = *nbl;
    }
}

static inline struct node_backlog *
nbl_pop(struct node_backlog **top, struct node_backlog **bottom) {
    return *top > *bottom? --*top : NULL;
}

void mib_tree_dump(void) {
    int level = 0;
    oid_t id = 0;
    struct mib_node *node = *dummy_root; 
    struct node_backlog nbl, *p_nbl = NULL;
    struct node_backlog *top, *bottom, nbl_stack[OID_MAX_LEN];

    top = bottom = nbl_stack;

    for (; ;) {
        if (node != NULL) {
            /* Fetch the pop-up backlogged node's sub-id. If not backlogged, set 0. */
            int sub_idx = p_nbl != NULL ? p_nbl->next_sub_idx : 0;
            /* Reset backlog for the node has gone deep down */
            p_nbl = NULL;

            /* Backlog the node */
            if (is_leaf(node) || sub_idx + 1 >= node->sub_id_cnt) {
                nbl.node = NULL;
                nbl.next_sub_idx = 0;
            } else {
                nbl.node = node;
                nbl.next_sub_idx = sub_idx + 1;
            }
            nbl_push(*nbl, *top, *bottom);
            level++;
      
            /* Draw lines as long as sub_idx is the first one */
            if (sub_idx == 0) {
                int i;
                for (i = 1; i < level; i++) {
                    if (i == level - 1) {
                        printf("%-8s", "+-------");
                    } else {
                        if (nbl_stack[i - 1].node != NULL) {
                            printf("%-8s", "|");
                        } else {
                            printf("%-8s", " ");
                        }
                    }
                }
                printf("%s(%d)\n", node->name, id);
            }

            /* Go deep down */
            id = node->sub_id[sub_idx];
            node = node->sub_ptr[sub_idx];
        } else {
            p_nbl = nbl_pop(*top, *bottom);
            if (p_nbl == NULL) {
                /* End of traversal */
                break;
            }
            node = p_nbl->node;
            level--;
        }
    }
}

The code is not complicated, just a few points

Depth-first traversal uses the backtracking point , that is, after reaching the end of a branch, backtracking to a location that was previously passed, and continuing to traverse from another branch, if backtracking to the root node, it means that the traversal is over, so the backtracking point is must be recorded. The question is where to record it? Taking a binary tree as an example, after traversing the left subtree, the next traversal is the right subtree, so the backtracking point is the right child; for a multi-fork tree, after traversing the Nth branch, the next step is to traverse the N+1 branch, so The backtracking point is N+1; if the last branch is traversed, you need to continue backtracking to find the backtracking point. Therefore, we use sub_idx + 1 to record the backtracking point. We can also use this attribute to make a classification. When the value is greater than or equal to 1, the backtracking point is valid, and when the value is equal to 0, the backtracking point is invalid.

Regarding log stack operations, the trick of secondary pointers is used here. This stack is very small, so it is not impossible to use function local variables for storage, and it has the advantage of not exposing data to the outside world. Then for the stack pointer, you need to pass the secondary pointer to change it. For example, let's look at the push operation:

(*(*top)++) = *nbl;

This is to copy the log object to the position pointed to by top, and then move the top pointer up. The difference between top and bottom is the number of stack elements. Since top is a secondary pointer, the assigned value is **top, and the pointer movement is (*top)++. Let's see the stack operation again:

return --*top;

First move top down by one unit, and then return the pointed log object, which is *top.

Next, it's time to explain the routine in depth. First, the root node is set to dummy, which is a virtual node. It is a coding technique used to ensure that there is only one node at the top level. For example, the tree command output directory tree always starts from the current directory ". "Start. Since the first time the loop is entered, the log stack is empty and there is no so-called backtracking point. We set the backtracking position index to 0, which has two meanings. One means that the backtracking point is invalid or does not exist. Second, since there is no backtracking, Then the traversal starts from the first branch of the current node.

Then we push the traversed nodes onto the stack, there is also a distinction here: **If the current is a leaf node, or all branches have been traversed, then we should continue to trace back to find the backtracking point, **we will set the backtracking point to invalid Then push the stack; otherwise, set the current node as the backtracking point, and record the position index and push the stack.

The line drawing output section will be discussed later. We enter the next layer according to the index sub_idx obtained earlier, until the backtracking bottom is reached. At this time, the backtracking point is popped from the log stack. There are three situations for pop: since the first push stack is the root node, the stack is empty to backtrack to the origin, and also It marks the end of the entire traversal and exits the loop; otherwise, check whether the backtracking point is NULL, and if it is empty, continue backtracking; if there is a valid backtracking point, take out the backtracking position index and continue the next round of traversal loop.

Finally, let's talk about terminal output. As mentioned earlier, the output of each line from left to right is the hierarchical traversal of the tree, which is actually the traversal of the log stack; the output of the newline is the branch traversal of the tree, which is each round of the loop. **The output content is mainly three symbols: indentation, branch and node content. We make the following strategy:

  • Indentation: When the backtracking point in the stack is invalid, there is no branch, print spaces, and align eight characters;
  • Branch: When the backtracking point in the stack is valid, it means that there is a branch, print "|" and space, and align eight characters;
  • Node: When the stack traverses to the last element, it means that the node content will be output later, print "+---", eight characters are aligned, followed by the node content.

Of course, you can also customize the printing strategy to make the output more beautiful. Well, I said a lot, see the effect, run the program, and it is clear at a glance.

<center>![MIB树](https://static.oschina.net/uploads/img/201702/06211345_ILBk.png "MIB树")</center>

B+ tree

The code is here . B+ tree is the underlying data structure commonly used in relational databases, and it is quite scary to implement. Fortunately, this article does not cover these. Here, B+ tree is used as a multi-fork tree to demonstrate how to print, especially when the definitions of leaf nodes and non-leaf nodes are different. From the output implementation, we found that the log object records only the pointer and backtracking position of the node, and has nothing to do with the data node itself. We can move the above code almost intact, and the running effect is as follows:

</center> ![B+树](https://static.oschina.net/uploads/img/201702/06211805_esIo.png "B+树") </center>

From the shape, it can be seen that the real data of the B+ tree is stored in the leaf nodes, and the whole tree is balanced.

Red-Black Tree (Binary Tree)

The code is here . Understand the implementation of multi-fork tree, binary tree is just a special simplified form. This article selects the red-black tree as the representative, the code is too lazy to write, and the Nginx source code is directly taken.

It can be observed that the position of the binary tree with respect to the backtracking point actually has only the right branch, that is to say, the backtracking position index has only one value, which is 1. In this way, we can make a simplification. Set the left branch index to 0 to indicate an invalid backtracking position, and set the right branch index to 1 to indicate a valid backtracking position. The code can be written as follows:

#define RBTREE_MAX_LEVEL   64
#define RBTREE_LEFT_INDEX  0
#define RBTREE_RIGHT_INDEX 1

void rbtree_dump(struct rbtree *tree)
{
    int level = 0;
    struct rbnode *node = tree->root, *sentinel = tree->sentinel;
    struct node_backlog nbl, *p_nbl = NULL;
    struct node_backlog *top, *bottom, nbl_stack[RBTREE_MAX_LEVEL];

    top = bottom = nbl_stack;

    for (; ;) {
        if (node != sentinel) {
            /* Fetch the pop-up backlogged node's sub-id. If not backlogged, set 0. */
            int sub_index = p_nbl != NULL ? p_nbl->next_sub_idx : RBTREE_LEFT_INDEX;
            /* backlog should be reset since node has gone deep down */
            p_nbl = NULL;

            /* Backlog the node */
            if (is_leaf(node, sentinel) || sub_index == RBTREE_RIGHT_INDEX) {
                nbl.node = sentinel;
                nbl.next_sub_idx = RBTREE_LEFT_INDEX;
            } else {
                nbl.node = node;
                nbl.next_sub_idx = RBTREE_RIGHT_INDEX;
            }
            nbl_push(&nbl, &top, &bottom);
            level++;

            /* Draw lines as long as sub_idx is the first one */
            if (sub_index == RBTREE_LEFT_INDEX) {
                /* Print intent, branch and node content... */
            }

            /* Move down according to sub_idx */
            node = sub_index == RBTREE_LEFT_INDEX ? node->left : node->right;
        } else {
            /* Pop up the node backlog... */
        }
    }
}

Let's take a look at the output... wait, we found that for a binary tree, the right child is printed on the next line of the left child, which is a bit visually unaccustomed, isn't it? Fortunately, I thoughtfully swapped the order of LEFT_INDEX and RIGHT_INDEX, the right child is output before the left child, so that you can tilt your head to look at the binary tree intuitively (laughs), and we also know that "flip" How easy is a binary tree (laughs).

<center>![Red-Black Tree](https://static.oschina.net/uploads/img/201702/06212006_glT5.png "Red-Black Tree")</center>

If a worker wants to do a good job, he must first sharpen his tools. Learned the tree structure printing tool. For such a data structure, only you can't write it, and you can't write it wrong. Finally, a thought question is given: how to realize the printing tree structure in a recursive form? (hint: use parameter passing)

Reference source code

Directory tree B+ tree red- black tree

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324642634&siteId=291194637