netty内存池之PoolArena

之前做了那么多铺垫，在这篇博文，我们将看清netty内存池管理的全貌。

PoolArena是一个抽象类，其子类为HeapArena和DirectArena对应堆内存(heap buffer)和堆外直接内存(direct buffer)，除了操作的内存(byte[]和ByteBuffer)不同外两个类完全一致）。PoolArena管理了之前一系列的类，这里讲介绍它的实现细节。该类的实现接口是PoolArenaMetric，是一些信息的统计分析，我们暂时忽略。

下面来看下PoolArena的成员

    static final int numTinySubpagePools = 512 >>> 4;

    final PooledByteBufAllocator parent;

    private final int maxOrder;// chunk相关满二叉树的高度
    final int pageSize;// 单个page的大小
    final int pageShifts;// 用于辅助计算
    final int chunkSize;// chunk的大小
    final int subpageOverflowMask; // 用于判断请求是否为Small/Tiny
    final int numSmallSubpagePools;// small请求的双向链表头个数
    final int directMemoryCacheAlignment;// 对齐基准
    final int directMemoryCacheAlignmentMask;// 用于对齐内存
    private final PoolSubpage<T>[] tinySubpagePools;// Subpage双向链表
    private final PoolSubpage<T>[] smallSubpagePools;// Subpage双向链表

还有一些由PoolChunkList为节点组成链表

    private final PoolChunkList<T> q050;
    private final PoolChunkList<T> q025;
    private final PoolChunkList<T> q000;
    private final PoolChunkList<T> qInit;
    private final PoolChunkList<T> q075;
    private final PoolChunkList<T> q100;

其中出现了tiny/small有必要解释下，一图胜千言，下面是我从网上找的图片

即不同大小的内存块，别叫不同的名称，用于Chunk块中的是Normal，正好8k为1page，于是小于8k的内存块成为Tiny/Small，其中小于512B的为Tiny。同理，Chunk块存不下的内存块为Huge。

    enum SizeClass {
        Tiny,
        Small,
        Normal
        // 除此之外的请求为Huge
    }

我们看下PoolArena的构造，构造挺多，我们一点一点分析。

 this.parent = parent;
        this.pageSize = pageSize;
        this.maxOrder = maxOrder;
        this.pageShifts = pageShifts;
        this.chunkSize = chunkSize;
        directMemoryCacheAlignment = cacheAlignment;
        directMemoryCacheAlignmentMask = cacheAlignment - 1;
        subpageOverflowMask = ~(pageSize - 1);

以上无非是成员赋初值。

        tinySubpagePools = newSubpagePoolArray(numTinySubpagePools);
        for (int i = 0; i < tinySubpagePools.length; i ++) {
            tinySubpagePools[i] = newSubpagePoolHead(pageSize);
        }

        numSmallSubpagePools = pageShifts - 9;
        smallSubpagePools = newSubpagePoolArray(numSmallSubpagePools);
        for (int i = 0; i < smallSubpagePools.length; i ++) {
            smallSubpagePools[i] = newSubpagePoolHead(pageSize);
        }

对tiny/smallSubpagePool的初始化，跟我们之前分析的subpage关联起来了。

q100 = new PoolChunkList<T>(this, null, 100, Integer.MAX_VALUE, chunkSize);
        q075 = new PoolChunkList<T>(this, q100, 75, 100, chunkSize);
        q050 = new PoolChunkList<T>(this, q075, 50, 100, chunkSize);
        q025 = new PoolChunkList<T>(this, q050, 25, 75, chunkSize);
        q000 = new PoolChunkList<T>(this, q025, 1, 50, chunkSize);
        qInit = new PoolChunkList<T>(this, q000, Integer.MIN_VALUE, 25, chunkSize);

        q100.prevList(q075);
        q075.prevList(q050);
        q050.prevList(q025);
        q025.prevList(q000);
        q000.prevList(null);
        qInit.prevList(qInit);

这几个PoolChunkList的命名其实是有含义的。其实是按照内存的使用率来取名的，如qInit代表一个chunk最开始分配后会进入它，随着其使用率增大会逐渐从q000到q100，而随着内存释放，使用率减小，它又会慢慢的从q100到q00。我们再来看下表

状态	最小内存使用率	最大内存使用率
QINIT	1	25
Q0	1	50
Q25	25	75
Q50	50	100
Q75	75	100
Q100	100	100

也就是说，一条PoolChunkList对应上面相应的参数，其中的chunk使用率均符合其中的标准，否则会自动调整到相应的链中。

我们看下构造PoolChunkList链的方法。

    void prevList(PoolChunkList<T> prevList) {
        assert this.prevList == null;
        this.prevList = prevList;
    }

看上面的构造函数我们可以看出PoolChunkList为节点的链的样子

我们可以看到，如果chunk在Q25，当他使用率低于25则跑到Q0，再当他使用率为0于是不再保留在内存中，其分配的内存被完全回收（它没有前项指针）。再看看QInit，即使完全回收也不会被释放，这样始终保留在内存中（它前项指针指向自己），后面的分配就无需新建chunk,减小了分配的时间。

分配内存时先从内存占用率相对较低的chunklist中开始查找，这样查找的平均用时就会更短

    private void allocateNormal(PooledByteBuf<T> buf, int reqCapacity, int normCapacity) {
        if (q050.allocate(buf, reqCapacity, normCapacity) || q025.allocate(buf, reqCapacity, normCapacity) ||
            q000.allocate(buf, reqCapacity, normCapacity) || qInit.allocate(buf, reqCapacity, normCapacity) ||
            q075.allocate(buf, reqCapacity, normCapacity)) {
            return;
        }

        // Add a new chunk.
        PoolChunk<T> c = newChunk(pageSize, maxOrder, pageShifts, chunkSize);
        long handle = c.allocate(normCapacity);
        assert handle > 0;
        c.initBuf(buf, handle, reqCapacity);
        qInit.add(c);
    }

源码注释告诉我们需要注意，上面这个方法已经是被synchronized修饰的了，因为chunk本身的访问不是线程安全的，因此我们在实际分配内存的时候必须保证线程安全，防止同一个内存块被多个对象申请到。在这个方法中我们能看到，分配内存时的查找顺序，先从低的开始找，但为什么不从q000开始？（网上找的答案，分析的非常到位！！）

在分析PoolChunkList的时候，我们知道一个chunk随着内存的不停释放，它本身会不停的往其所在的chunk list的prev list移动，直到其完全释放后被回收。如果这里是从q000开始尝试分配，虽然分配的速度可能更快了（因为分配成功的几率更大），但一个chunk在使用率为25%以内时有更大几率再分配，也就是一个chunk被回收的几率大大降低了。这样就带来了一个问题，我们的应用在实际运行过程中会存在一个访问高峰期，这个时候内存的占用量会是平时的几倍，因此会多分配几倍的chunk出来，而等高峰期过去以后，由于chunk被回收的几率降低，内存回收的进度就会很慢（因为没被完全释放，所以无法回收），内存就存在很大的浪费。

为什么是从q050开始尝试分配呢，q050是内存占用50%~100%的chunk，猜测是希望能够提高整个应用的内存使用率，因为这样大部分情况下会使用q050的内存，这样在内存使用不是很多的情况下一些利用率低(<50%)的chunk慢慢就会淘汰出去，最终被回收。然而为什么不是从qinit中开始呢，这里的chunk利用率低，但又不会被回收，岂不是浪费？q075,q100由于使用率高，分配成功的几率也会更小，因此放到最后（q100上的chunk使用率都是100%，为什么还要尝试从这里分配呢？？）。

再往下，如果整个list都无法分配，创建一个新的chunk，加入到qinit中并分配空间。

我们再来顺便看下huge的内存分配

    private void allocateHuge(PooledByteBuf<T> buf, int reqCapacity) {
        PoolChunk<T> chunk = newUnpooledChunk(reqCapacity);
        activeBytesHuge.add(chunk.chunkSize());
        buf.initUnpooled(chunk, reqCapacity);
        allocationsHuge.increment();
    }

直接使用了buf.initUnpooled(chunk, reqCapacity);没用什么优化策略，可能由于使用率不高。

是不是期待很久了，我们看下整内存分配的个过程吧

 private void allocate(PoolThreadCache cache, PooledByteBuf<T> buf, final int reqCapacity) {
        final int normCapacity = normalizeCapacity(reqCapacity);
        if (isTinyOrSmall(normCapacity)) { // capacity < pageSize
            int tableIdx;
            PoolSubpage<T>[] table;
            boolean tiny = isTiny(normCapacity);
            if (tiny) { // < 512
                if (cache.allocateTiny(this, buf, reqCapacity, normCapacity)) {
                    // was able to allocate out of the cache so move on
                    return;
                }
                tableIdx = tinyIdx(normCapacity);
                table = tinySubpagePools;
            } else {
                if (cache.allocateSmall(this, buf, reqCapacity, normCapacity)) {
                    // was able to allocate out of the cache so move on
                    return;
                }
                tableIdx = smallIdx(normCapacity);
                table = smallSubpagePools;
            }

            final PoolSubpage<T> head = table[tableIdx];

            /**
             * Synchronize on the head. This is needed as {@link PoolChunk#allocateSubpage(int)} and
             * {@link PoolChunk#free(long)} may modify the doubly linked list as well.
             */
            synchronized (head) {
                final PoolSubpage<T> s = head.next;
                if (s != head) {
                    assert s.doNotDestroy && s.elemSize == normCapacity;
                    long handle = s.allocate();
                    assert handle >= 0;
                    s.chunk.initBufWithSubpage(buf, handle, reqCapacity);
                    incTinySmallAllocation(tiny);
                    return;
                }
            }
            synchronized (this) {
                allocateNormal(buf, reqCapacity, normCapacity);
            }

            incTinySmallAllocation(tiny);
            return;
        }
        if (normCapacity <= chunkSize) {
            if (cache.allocateNormal(this, buf, reqCapacity, normCapacity)) {
                // was able to allocate out of the cache so move on
                return;
            }
            synchronized (this) {
                allocateNormal(buf, reqCapacity, normCapacity);
                ++allocationsNormal;
            }
        } else {
            // Huge allocations are never served via the cache so just call allocateHuge
            allocateHuge(buf, reqCapacity);
        }
    }

终于把整个过程贴出来了，让我们一点一点分析。先根据申请内存大小区分开来

1. tiny/small内存的话将table赋值为tiny/smallsubPagePool。先从cache中获取内存，失败了则去对应的poolsubPage中去获取(比如size为16，则从tinylsubPagePool[0]中获取，size为512，则从smallsubPagePool[0]中获取，以此类推)，需要加锁；如果双向链表还没初始化，则会使用Normal请求分配Chunk块中的一个Page，Page以请求大小为基准进行切分并分配第一块内存，然后加入到双向链表中(调用顺序arena->chunkList->chunk->subpage)。

2.normal内存，先从cache中获取，如果没有则调用allocateNormal分配满足要求的连续的Page块。

3.对于Huge请求，则直接使用Unpooled直接分配。

其中内存大小类型有巧妙的位运算，可以看一下

    // capacity < pageSize
    boolean isTinyOrSmall(int normCapacity) {
        return (normCapacity & subpageOverflowMask) == 0;
    }

    // normCapacity < 512
    static boolean isTiny(int normCapacity) {
        return (normCapacity & 0xFFFFFE00) == 0;
    }

下面来看下内存释放的整个过程

    void free(PoolChunk<T> chunk, long handle, int normCapacity, PoolThreadCache cache) {
        if (chunk.unpooled) {
            int size = chunk.chunkSize();
            destroyChunk(chunk);
            activeBytesHuge.add(-size);
            deallocationsHuge.increment();
        } else {
            SizeClass sizeClass = sizeClass(normCapacity);
            if (cache != null && cache.add(this, chunk, handle, normCapacity, sizeClass)) {
                // cached so not free it.
                return;
            }

            freeChunk(chunk, handle, sizeClass);
        }
    }

如果内存是Huge类型，则直接释放(调用抽象方法子类具体实现)，并统计相关信息。否则，找出类型，并且可以缓存的话就缓存，否则释放(调用freeChunk)。

    void freeChunk(PoolChunk<T> chunk, long handle, SizeClass sizeClass) {
        final boolean destroyChunk;
        synchronized (this) {
            switch (sizeClass) {
            case Normal:
                ++deallocationsNormal;
                break;
            case Small:
                ++deallocationsSmall;
                break;
            case Tiny:
                ++deallocationsTiny;
                break;
            default:
                throw new Error();
            }
            destroyChunk = !chunk.parent.free(chunk, handle);
        }
        if (destroyChunk) {
            // destroyChunk not need to be called while holding the synchronized lock.
            destroyChunk(chunk);
        }
    }

其中parent是poolChunkList，free则是先释放handle空间，再从对应的qXXX不断内存装填->q000最后如果有多出的chunk则，调用抽象方法destroy(chunk);(具体子类来实现)

可以注意到本类重写了Object的finalize()方法，该可能会在方法对象被在gc前调用

    @Override
    protected final void finalize() throws Throwable {
        try {
            super.finalize();
        } finally {
            destroyPoolSubPages(smallSubpagePools);
            destroyPoolSubPages(tinySubpagePools);
            destroyPoolChunkLists(qInit, q000, q025, q050, q075, q100);
        }
    }

    private static void destroyPoolSubPages(PoolSubpage<?>[] pages) {
        for (PoolSubpage<?> page : pages) {
            page.destroy();
        }
    }

    private void destroyPoolChunkLists(PoolChunkList<T>... chunkLists) {
        for (PoolChunkList<T> chunkList: chunkLists) {
            chunkList.destroy(this);
        }
    }

本类还一个值得一看的方法，重新分配内存

    void reallocate(PooledByteBuf<T> buf, int newCapacity, boolean freeOldMemory) {
        if (newCapacity < 0 || newCapacity > buf.maxCapacity()) {
            throw new IllegalArgumentException("newCapacity: " + newCapacity);
        }

        int oldCapacity = buf.length;
        if (oldCapacity == newCapacity) {
            return;
        }

        PoolChunk<T> oldChunk = buf.chunk;
        long oldHandle = buf.handle;
        T oldMemory = buf.memory;
        int oldOffset = buf.offset;
        int oldMaxLength = buf.maxLength;
        int readerIndex = buf.readerIndex();
        int writerIndex = buf.writerIndex();

        allocate(parent.threadCache(), buf, newCapacity);
        if (newCapacity > oldCapacity) {
            memoryCopy(
                    oldMemory, oldOffset,
                    buf.memory, buf.offset, oldCapacity);
        } else if (newCapacity < oldCapacity) {
            if (readerIndex < newCapacity) {
                if (writerIndex > newCapacity) {
                    writerIndex = newCapacity;
                }
                memoryCopy(
                        oldMemory, oldOffset + readerIndex,
                        buf.memory, buf.offset + readerIndex, writerIndex - readerIndex);
            } else {
                readerIndex = writerIndex = newCapacity;
            }
        }

        buf.setIndex(readerIndex, writerIndex);

        if (freeOldMemory) {
            free(oldChunk, oldHandle, oldMaxLength, buf.cache);
        }
    }

一点一点分析，如果重新分配内存跟原内存大小一致，那么直接返回。先重新申请一段所需大小的空间，如果原来申请的内存大小小于新申请的，那么把原内存的内容拷贝到新内存中，否则，把原内存可读的部分数据拷贝过来，但如果连可读的数据大小都比新申请的内存大小要大，那么没可读的内存了。设置好readIndex跟writeIndex，然后把原内存释放。

有没有发现到这里，整个内存池的过程基本上理清了，Arene->chunkList->chunk，Arena->subPage。

剩下cache里面的细节跟往外PooledByteBuf的细节(有没有发现这个已经接触到了buf，快到使用层面了，read/writeIndex也是很眼熟的吧)是我们没梳理的。

最后列举下剩下的抽象方法：

    // 判断子类实现Heap还是Direct
    abstract boolean isDirect();
    // 新建一个Chunk，Tiny/Small，Normal请求请求分配时调用
    protected abstract PoolChunk<T> newChunk(int pageSize, int maxOrder, int pageShifts, int chunkSize);
    // 新建一个Chunk，Huge请求分配时调用
    protected abstract PoolChunk<T> newUnpooledChunk(int capacity);
    protected abstract PooledByteBuf<T> newByteBuf(int maxCapacity);
    // 复制内存，当ByteBuf扩充容量时调用
    protected abstract void memoryCopy(T src, int srcOffset, T dst, int dstOffset, int length);
    // 销毁Chunk，释放内存时调用
    protected abstract void destroyChunk(PoolChunk<T> chunk);

netty内存池之PoolArena

猜你喜欢