论文地址

Adaptive Radix Tree: https://db.in.tum.de/~leis/papers/ART.pdf

Persistent Adaptive Radix Tree: https://ankurdave.com/dl/part-tr.pdf

数据结构

如图所示为整颗树的大致结构分为根节点root 普通节点node和叶节点leaf。整个查找过程从根节点开始找到叶节点，例如第一个叶节点的查找过程即为A-N-D，最后找到AND叶节点。有时叶节点也会指向另一个值，这样我们就构成了一个很方便查找的key-value键值对结构。Persistent Adaptive Radix Tree基本是基于Radix tree进行的改动，网上有很多文章阐述Radix Tree，本文不再赘述。

Node4

Node4是PART中最小的一种node结构，可以存储4个子节点指针，通常用于子节点数目为1-4个的node节点。这个结构由key数组和child pointer数组组成，且key的顺序和child pointer的顺序是相应的。查找具体节点的过程可以在key数组中顺序或二分查找，查找到的下标可在child pointer数组中找到查找到的节点的子节点，即

Node findChild(int value){
    for(int i=0;i<4;i++){
        if(key[i] == value) return childPointer[i];
    }
    return null;
}

Node16

Node16存储5-16个子节点，具体细节与Node4基本一致不再赘述

Node48

Node48开始就有所不同了，可以存储17-48个子节点。其用到长度为256的子索引数组和48的child pointer数组。由于此时包含的子节点过多，线性查找效率低下，于是就采用直接映射的方式。例如当前的要查找的value是123，那么就在child index中取childIndex[123]，其值就是child pointer中的下标，childPointer[childIndex[123]]就是对应的子节点，即

扫描二维码关注公众号，回复： 11225382 查看本文章

Node findChild(int value){
    return childPointer[childIndex[value]];
}

由于child index中最大值为48，因此6 bits就足够(有时出于方便使用1byte)相比直接使用256个child pointer(2568=2048)，两者结合(2561+48*8=640)的方式更加节省空间。

Node256

存储49-256个子节点，采用直接映射的方式

Node findChild(int value){
    return childPointer[value];
}

Path Compression和Lazy Expansion

lazy expansion是指内部节点node只有在用来区分两个指向不同叶节点的路径时才会被创建(通俗来讲，含有前缀时)。如图所示，当只有一条路径指向FOO时，那么仅有FOO一个叶节点而没有中间的两个OO；当另一个以F开头的叶节点插入时候，此时就需要拓展出OO来和另一个F开头的叶节点做区分。

path compression是指当只有一个子节点时，移除所有的内部节点node(通常是合并入子节点)。合并带来了前缀，前缀需要在查找叶节点时时进行比较，于是有两种方法解决这个问题：

悲观方法：在每个内部节点上，都存储了一个可变长度（可能为空）的部分key的vector。它包含所有先前已删除的单一节点的key。在查找期间，将此vector与搜索关键字value进行比较，然后继续处理下一个子节点。
乐观方法：仅存储先前的单一节点的数量（等于悲观方法中向量的长度）。查找只是跳过此字节数，而不进行比较。因此，当查找到达叶子时，必须将其关键字与搜索关键字进行比较，以确保未进行“错误的转弯”。

在 PART 的实现中结合了这两种方法，每个节点存放最多8个字节的前缀，下降会根据前缀长度进行动态切换。

其他特性

为了持久化还是用了Path Copying 即更新时拷贝待更新节点至根节点之间的路径返回一个新的变更后的节点。对于非交集节点还可以使用直接更新的方式加快速度，如图5

为了解决碎片问题和分布过散的问题，本文还提出了池化和删除后压缩的方法。如图6

本文使用了检查点机制Incremental Checkingpointing保证错误恢复。如果一个子树当前时刻与上一个检查点完全一致，就直接refer到上一个检查点而不写出。具体做法是：对当前状态做一个快照，分隔为子树，将每一个子树存为一个不同的文件，将页间指针作为文件标识符。根节点指向每个子树的文件标识符，并且每次的直接更新都会删除文件描述符，这样每个子树都可以保证被更新到最新版本而不会被下一次checkingpointing重复。

源码剖析

项目结构

类图

以下内容请结合代码https://github.com/ankurdave/part看

Node.java

成员变量

static final int MAX_PREFIX_LEN = 8;// 上文提到的copy compression时的最长前缀长度
int refcount;// 被引用数 类比垃圾处理机制的引用计数器

方法

此类是一个抽象类定义了一些节点应有的属性具体实现参加具体的节点

Leaf.java

成员变量

public static int count;// 叶子节点数
Object value;// 值
final byte[] key;// 键 注意这里的键是下降查找过程中所有的键 即原始的(k,v)的k

方法

prefix_matches(final byte[] prefix)用于验证前缀是否和该叶节点的key匹配。注意prefix的长度一定不超过key的长度

public boolean prefix_matches(final byte[] prefix) { //prefix是指产生了节点压缩之后的前缀
    if (this.key.length < prefix.length) return false;
    for (int i = 0; i < prefix.length; i++) {
        if (this.key[i] != prefix[i]) {
            return false;
        }
    }
    return true;
}

longest_common_prefix(Leaf other, int depth)比较两个叶节点的最长公共前缀。这里的最长公共前缀并非是从key的0位置算起的，而是从depth(当前节点深度)开始算。该方法的用处是在insert方法中得到一个longest_prefix来将当前的节点分裂成一个ArtNode4，故而所需的前缀只是从当前节点深度对应的key开始算起的。

public int longest_common_prefix(Leaf other, int depth) { // 从depth开始算 最长公共前缀 这里的depth应该是当前下降的深度
    int max_cmp = Math.min(key.length, other.key.length) - depth;
    int idx;
    for (idx = 0; idx < max_cmp; idx++) {
        if (key[depth + idx] != other.key[depth + idx]) {
            return idx;
        }
    }
    return idx;
}

@Override public boolean insert()插入操作

如图，想要把(...FOA,2)插入到树中，插入结果如右边所示。插入有两种情况：

key已经存在，那么直接更新叶子节点即可。
key不存在。注意到我们实现的是Leaf类的insert 即从根节点下降查找的过程中下降到最后遇到的是叶子节点。一旦出现这种情况，我们需要记起之前所提到的lazy expansion，因此我们需要获取待插入叶节点和当前叶节点的公共前缀，并将当前叶节点变成内部节点ArtNode4，并将ArtNode4指向当前节点和待插入节点。示例如图：

@Override public boolean insert(ChildPtr ref, final byte[] key, Object value,
                                int depth, boolean force_clone) throws UnsupportedOperationException {
    boolean clone = force_clone || this.refcount > 1;// 即论文中的是 path-copy or in-place update
    if (matches(key)) { // 匹配到存在叶子结点 即更新旧节点
        if (clone) { // path copy
            // Updating an existing value, but need to create a new leaf to
            // reflect the change
            ref.change(new Leaf(key, value));
        } else {// in-place update
            // Updating an existing value, and safe to make the change in
            // place
            this.value = value;
        }
        return false;
    } else { // 插入叶节点
        // New value
        // Create a new leaf
        Leaf l2 = new Leaf(key, value);
        // Determine longest prefix
        int longest_prefix = longest_common_prefix(l2, depth);
        if (depth + longest_prefix >= this.key.length ||
            depth + longest_prefix >= key.length) {
            throw new UnsupportedOperationException("keys cannot be prefixes of other keys");
        }
        // Split the current leaf into a node4
        ArtNode4 result = new ArtNode4();
        result.partial_len = longest_prefix;
        Node ref_old = ref.get(); //旧的指向该叶节点的内部节点
        ref.change_no_decrement(result);// 直接更新
        System.arraycopy(key, depth,
                         result.partial, 0,
                         Math.min(Node.MAX_PREFIX_LEN, longest_prefix));
        // Add the leafs to the new node4
        result.add_child(ref, this.key[depth + longest_prefix], this);
        result.add_child(ref, l2.key[depth + longest_prefix], l2);
        ref_old.decrement_refcount();// 原来的节点由叶节点变成了内部节点 因此原来节点
        // TODO: avoid the increment to self immediately followed by decrement
        return true;
    }
}

ArtNode.java

成员变量

int num_children = 0;
int partial_len = 0;// path compression时的前缀长度
final byte[] partial = new byte[Node.MAX_PREFIX_LEN]; // path compression时的前缀

方法

prefix_mismatch()查找key和当前ArtNode最先不匹配的位置。由于在path compression时候我们使用了乐观+悲观的方式，因此前缀长度大于我们规定的上限8时，多出来的前缀溢出存储到其子节点中。

public int prefix_mismatch(final byte[] key, int depth) {
    int max_cmp = Math.min(Math.min(Node.MAX_PREFIX_LEN, partial_len), key.length - depth);
    int idx;
    for (idx = 0; idx < max_cmp; idx++) {
        if (partial[idx] != key[depth + idx])
            return idx;
    }
    // If the prefix is short we can avoid finding a leaf
    if (partial_len > Node.MAX_PREFIX_LEN) {
        // Prefix is longer than what we've checked, find a leaf
        final Leaf l = this.minimum();
        max_cmp = Math.min(l.key.length, key.length) - depth;
        for (; idx < max_cmp; idx++) {
            if (l.key[idx + depth] != key[depth + idx])
                return idx;
        }
    }
    return idx;
}

insert(),即下降查找到最后是一个ArtNode时，插入一个叶子节点。

如果该ArtNode有前缀，即进行过path compression
- if不一致发生在前缀长度之后那么depth增加partial_len，去找叶子节点
- else 分裂当前节点生成新节点，令公共前缀为其前缀，公共前缀后一字节作为区分两个 key 的字节，然后将叶子节点和截断公共前缀后的老节点插入到这个新节点中
没有前缀或不一致发生在前缀长度之后如果能获取到子节点则在子节点中插入；否则在本节点插入

@Override public boolean insert(ChildPtr ref, final byte[] key, Object value,
                                int depth, boolean force_clone) {
    boolean do_clone = force_clone || this.refcount > 1;
    // Check if given node has a prefix
    if (partial_len > 0) {
        // Determine if the prefixes differ, since we need to split
        int prefix_diff = prefix_mismatch(key, depth);
        if (prefix_diff >= partial_len) {
            depth += partial_len; // 如果不一致的地方在partial后 那么则partial中的全都被匹配上了 去找叶子 depth增加partial_len
        } else {
            // Create a new node
            ArtNode4 result = new ArtNode4();
            Node ref_old = ref.get();
            // ref被一个新节点result共享
            ref.change_no_decrement(result); // don't decrement yet, because doing so might destroy self
            result.partial_len = prefix_diff;
            System.arraycopy(partial, 0,
                             result.partial, 0,
                             Math.min(Node.MAX_PREFIX_LEN, prefix_diff));
            // Adjust the prefix of the old node
            ArtNode this_writable = do_clone ? (ArtNode)this.n_clone() : this;
            if (partial_len <= Node.MAX_PREFIX_LEN) {
                result.add_child(ref, this_writable.partial[prefix_diff], this_writable);
                this_writable.partial_len -= (prefix_diff + 1);
                System.arraycopy(this_writable.partial, prefix_diff + 1,
                                 this_writable.partial, 0,
                                 Math.min(Node.MAX_PREFIX_LEN, this_writable.partial_len));
            } else {
                this_writable.partial_len -= (prefix_diff+1);
                final Leaf l = this.minimum();
                result.add_child(ref, l.key[depth + prefix_diff], this_writable);
                System.arraycopy(l.key, depth + prefix_diff + 1,
                                 this_writable.partial, 0,
                                 Math.min(Node.MAX_PREFIX_LEN, this_writable.partial_len));
            }
            // Insert the new leaf
            Leaf l = new Leaf(key, value);
            result.add_child(ref, key[depth + prefix_diff], l);
            ref_old.decrement_refcount();
            return true;
        }
    }

delete()删除操作

如果key在当前node没有匹配那么不存在节点退出
- 深度增加一个前缀长度
查找子节点
- 没找到错误退出
- 删除叶子节点本身并 remove child

@Override public boolean delete(ChildPtr ref, final byte[] key, int depth,
                                boolean force_clone) {
    // Bail if the prefix does not match
    if (partial_len > 0) {
        int prefix_len = check_prefix(key, depth);
        if (prefix_len != Math.min(MAX_PREFIX_LEN, partial_len)) {
            return false;
        }
        depth += partial_len;
    }
    boolean do_clone = force_clone || this.refcount > 1;
    // Clone self if necessary. Note: this allocation will be wasted if the
    // key does not exist in the child's subtree
    ArtNode this_writable = do_clone ? (ArtNode)this.n_clone() : this;
    // Find child node
    ChildPtr child = this_writable.find_child(key[depth]);
    if (child == null) return false; // when translating to C++, make sure to delete this_writable
    if (do_clone) {
        ref.change(this_writable);
    }
    boolean child_is_leaf = child.get() instanceof Leaf;
    boolean do_delete = child.get().delete(child, key, depth + 1, do_clone);
    if (do_delete && child_is_leaf) {
        // The leaf to delete is our child, so we must remove it
        this_writable.remove_child(ref, key[depth]);
    }
    return do_delete;
}

ArtNode4.java

成员变量

public static int count;// ArtNode4节点数目
byte[] keys = new byte[4];
Node[] children = new Node[4];

方法

add_child()增加一个子节点

首先检查子节点数是若没超过4个则找到key待拆入位置(key是增序的) 然后插入
否则变更为ArtNode16再在增加

@Override public void add_child(ChildPtr ref, byte c, Node child) {
    assert(refcount <= 1);
    if (this.num_children < 4) {
        int idx;
        for (idx = 0; idx < this.num_children; idx++) {
            if (to_uint(c) < to_uint(keys[idx])) break;
        }
        // Shift to make room
        System.arraycopy(this.keys, idx, this.keys, idx + 1, this.num_children - idx);
        System.arraycopy(this.children, idx, this.children, idx + 1, this.num_children - idx);
        // Insert element
        this.keys[idx] = c;
        this.children[idx] = child;
        child.refcount++;
        this.num_children++;
    } else {
        // Copy the node4 into a new node16
        ArtNode16 result = new ArtNode16(this);
        // Update the parent pointer to the node16
        ref.change(result);
        // Insert the element into the node16 instead
        result.add_child(ref, c, child);
    }
}

remove_child()移除一个子节点

这里需要注意的时如果移除后，仅剩一个子节点且不为叶子节点，那么就会发生path compression。这里将本节点的唯一一个key移入partial，然后合并到子节点。

@Override public void remove_child(ChildPtr ref, byte c) {
    assert(refcount <= 1);
    int idx;
    for (idx = 0; idx < this.num_children; idx++) {
        if (c == keys[idx]) break;
    }
    if (idx == this.num_children) return;
    assert(children[idx] instanceof Leaf);
    children[idx].decrement_refcount();
    // Shift to fill the hole
    System.arraycopy(this.keys, idx + 1, this.keys, idx, this.num_children - idx - 1);
    System.arraycopy(this.children, idx + 1, this.children, idx, this.num_children - idx - 1);
    this.num_children--;
    // Remove nodes with only a single child
    if (num_children == 1) {
        Node child = children[0];
        if (!(child instanceof Leaf)) {
            if (((ArtNode)child).refcount > 1) {
                child = child.n_clone();
            }
            ArtNode an_child = (ArtNode)child;
            // Concatenate the prefixes
            int prefix = partial_len;
            if (prefix < MAX_PREFIX_LEN) {
                partial[prefix] = keys[0];
                prefix++;
            }
            if (prefix < MAX_PREFIX_LEN) {
                int sub_prefix = Math.min(an_child.partial_len, MAX_PREFIX_LEN - prefix);
                System.arraycopy(an_child.partial, 0, partial, prefix, sub_prefix);
                prefix += sub_prefix;
            }
            // Store the prefix in the child
            System.arraycopy(partial, 0, an_child.partial, 0, Math.min(prefix, MAX_PREFIX_LEN));
            an_child.partial_len += partial_len + 1;
        }
        ref.change(child);
    }
}

对于ArtNode16.java ArtNode48.java和ArtNode256.java，实现方式与ArtNode4.java大致相似，具体差异课参考上一节的数据结构部分来理解。其余文件，均为一些基础性代码，例如迭代器等，非常好理解，此处不再赘述。

PART(Persistent Adaptive Radix Tree)的Java实现源码剖析

论文地址

数据结构

Node4

Node16

Node48

Node256

Path Compression和Lazy Expansion

其他特性

源码剖析

项目结构

类图

Node.java

成员变量

方法

Leaf.java

成员变量

方法

ArtNode.java

成员变量

方法

ArtNode4.java

成员变量

方法

猜你喜欢