stl allocator源码学习

概述

介绍几个allocator的源码实现：简单的对operator new和operator delete进行封装的实现，vs2015中的实现，STLport中的实现，仿造STLport实现内存池。

1. 参考

http://www.cplusplus.com/reference/memory/allocator/
《STL源码剖析》
《C++ Primer 第五版》
《Generic Programming and the STL》（《泛型编程和STL》）
MSDN

2. 介绍

std::allocator是STL容器使用的内存配置器，也是标准库唯一预定义的内存配置器。

3. 实现一：最简单的实现

3.1 程序实现

template<class T>
class allocator
{
public:
    // 1、为什么需要下面这些成员，有什么作用呢？
    typedef T          value_type;
    typedef T*         pointer;
    typedef const T*   const_pointer;
    typedef T&         reference;
    typedef const T&   const_reference;
    typedef size_t     size_type;       // size_t是无符号整数
    // ptrdiff_t是有符号整数，代表指针相减结果的类型
    typedef ptrdiff_t  difference_type;

    // 2、这是做什么用的，为何template是U，而不是与allocator的T一致？
    template<class U>
    struct rebind
    {
        typedef allocator<U> other;
    };

    // 默认构造函数，什么都不做
    allocator() noexcept
    {
    }

    // 泛化的构造函数，也什么都不做
    // 3、为什么需要这个泛化的构造函数，不同类型的allocator复制合适吗？
    template<class U>
    allocator(const allocator<U>&) noexcept
    {
    }

    // 析构函数，什么都不做
    ~allocator() noexcept
    {
    }

    // 返回对象地址
    pointer address(reference val) const noexcept
    {
        //non-const版调用const版，参见《Effective C++》条款3
        return const_cast<reference>(address(static_cast<const_reference>(val)));
    }

    // 返回对象地址
    const_pointer address(const_reference val) const noexcept
    {
        return &val;
    }

    // 申请内存，count是指对象个数，不是字节数。
    // 4、hint是做什么的？
    pointer allocate(size_type count, allocator<void>::const_pointer hint = nullptr)
    {
        return static_cast<pointer>(::operator new(count * sizeof(value_type)));
    }

    // 释放内存
    void deallocate(pointer ptr, size_type count)
    {
        ::operator delete(ptr);
    }

    // 可配置的最大量(指对象个数，不是字节数)
    size_type max_size() const noexcept
    {
        return (static_cast<size_type>(-1) / sizeof(value_type));
    }

    // 构造对象，Args是模板参数包，见《C++ Primer》第5版16.4节
    template <class U, class... Args>
    void construct(U* p, Args&&... args)
    {
        ::new ((void *)p) U(::std::forward<Args>(args)...);
    }

    // 析构对象
    template <class U>
    void destroy(U* p)
    {
        p->~U(); // 原来模板还可以这样用
    }
};

// 5、为什么要对void进行特化？
template<>
class allocator<void>
{
public:
    typedef void value_type;
    typedef void *pointer;
    typedef const void *const_pointer;
    template <class U> struct rebind
    {
        typedef allocator<U> other;
    };
};

3.2 问题解答

1、STL的规范，同时这些type在迭代器和traits技术中有用。

2、摘自MSDN：A structure that enables an allocator for objects of one type to allocate storage for objects of another type.
This structure is useful for allocating memory for type that differs from the element type of the container being implemented.
The member template class defines the type other. Its sole purpose is to provide the type name allocator<_Other>, given the type name allocator<Type>.

For example, given an allocator object al of type A, you can allocate an object of type _Other with the expression:
A::rebind<Other>::other(al).allocate(1, (Other *)0)
Or, you can name its pointer type by writing the type:
A::rebind<Other>::other::pointer

具体例子：一个保存int的列表list<int>，列表存储的对象并不是int本身，而是一个数据结构，它保存了int并且还包含指向前后元素的指针。那么，list<int, allocator<int>>如何知道分配这个内部数据结构呢？毕竟allocator<int>只知道分配int类型的空间。这就是rebind要解决的问题。通过allocator<int>::rebind<_Node>()你就可以创建出用于分配_Node类型空间的分配器了。

3、allocator类的模板参数只有一个，代表分配的元素类型，如果allocator封装的仅是内存的分配策略而与元素类型无关，定义泛型复制构造好像没什么不合理，同时如果不定义成泛型rebind将无法使用。construct成员函数和destroy成员函数也是泛型，allocator的使用条件还是特别宽松的。

4、hint
Either 0 or a value previously obtained by another call to allocate and not yet freed with deallocate.
When it is not 0, this value may be used as a hint to improve performance by allocating the new block near the one specified. The address of an adjacent element is often a good choice.

5、只有void *变量，没有void变量，没有void&变量，不能typedef void value_type等等。

4. 实现二：vs2015中的实现

4.1 程序实现（部分和实现一类似的内容省略）

template<class _Ty>
	class allocator
	{	// generic allocator for objects of class _Ty
public:
	static_assert(!is_const<_Ty>::value,
		"The C++ Standard forbids containers of const elements "
		"because allocator<const T> is ill-formed.");

……

	template<class _Other>
		allocator<_Ty>& operator=(const allocator<_Other>&)
		{	// assign from a related allocator (do nothing)
		return (*this);
		}

	void deallocate(pointer _Ptr, size_type _Count)
		{	// deallocate object at _Ptr
		_Deallocate(_Ptr, _Count, sizeof (_Ty));
		}

	__declspec pointer allocate(size_type _Count)
		{	// allocate array of _Count elements
		return (static_cast<pointer>(_Allocate(_Count, sizeof (_Ty))));
		}

	__declspec pointer allocate(size_type _Count, const void *)
		{	// allocate array of _Count elements, ignore hint
		return (allocate(_Count));
		}

……
	};

4.2 问题解释

1、static_assert ：Tests a software assertion at compile time. If the specified constant expression is false, the compiler displays the specified message and the compilation fails with error C2338; otherwise, the declaration has no effect.

2、__declspec ：Microsoft Specific. Tells the compiler not to insert buffer overrun security checks for a function.

3、这里的实现也比较简单，只是对内存分配函数（_Allocate）和释放函数（_Deallocate）进行简单的封装。

4.3 _Allocate和_Deallocate的实现

直接复制过来的实现如下

inline
	_DECLSPEC_ALLOCATOR void *_Allocate(size_t _Count, size_t _Sz,
		bool _Try_aligned_allocation = true)
	{	// allocate storage for _Count elements of size _Sz
	void *_Ptr = 0;

	if (_Count == 0)
		return (_Ptr);

	// check overflow of multiply
	if ((size_t)(-1) / _Sz < _Count)
		_Xbad_alloc();	// report no memory
	const size_t _User_size = _Count * _Sz;

 #if defined(_M_IX86) || defined(_M_X64)
	if (_Try_aligned_allocation
		&& _BIG_ALLOCATION_THRESHOLD <= _User_size)
		{	// allocate large block
		static_assert(sizeof (void *) < _BIG_ALLOCATION_ALIGNMENT,
			"Big allocations should at least match vector register size");
		const size_t _Block_size = _NON_USER_SIZE + _User_size;
		if (_Block_size <= _User_size)
			_Xbad_alloc();	// report no memory
		const uintptr_t _Ptr_container =
			reinterpret_cast<uintptr_t>(::operator new(_Block_size));
		_SCL_SECURE_ALWAYS_VALIDATE(_Ptr_container != 0);
		_Ptr = reinterpret_cast<void *>((_Ptr_container + _NON_USER_SIZE)
			& ~(_BIG_ALLOCATION_ALIGNMENT - 1));
		static_cast<uintptr_t *>(_Ptr)[-1] = _Ptr_container;

 #ifdef _DEBUG
		static_cast<uintptr_t *>(_Ptr)[-2] = _BIG_ALLOCATION_SENTINEL;
 #endif /* _DEBUG */
		}
	else
 #endif /* defined(_M_IX86) || defined(_M_X64) */

		{	// allocate normal block
		_Ptr = ::operator new(_User_size);
		_SCL_SECURE_ALWAYS_VALIDATE(_Ptr != 0);
		}
	return (_Ptr);
	}

		// FUNCTION _Deallocate
inline
	void _Deallocate(void * _Ptr, size_t _Count, size_t _Sz)
	{	// deallocate storage for _Count elements of size _Sz
 #if defined(_M_IX86) || defined(_M_X64)
	_SCL_SECURE_ALWAYS_VALIDATE(_Count <= (size_t)(-1) / _Sz);
	const size_t _User_size = _Count * _Sz;
	if (_BIG_ALLOCATION_THRESHOLD <= _User_size)
		{	// deallocate large block
		const uintptr_t _Ptr_user = reinterpret_cast<uintptr_t>(_Ptr);
		_SCL_SECURE_ALWAYS_VALIDATE(
			(_Ptr_user & (_BIG_ALLOCATION_ALIGNMENT - 1)) == 0);
		const uintptr_t _Ptr_ptr = _Ptr_user - sizeof(void *);
		const uintptr_t _Ptr_container =
			*reinterpret_cast<uintptr_t *>(_Ptr_ptr);

 #ifdef _DEBUG
		// If the following asserts, it likely means that we are performing
		// an aligned delete on memory coming from an unaligned allocation.
		_SCL_SECURE_ALWAYS_VALIDATE(
			reinterpret_cast<uintptr_t *>(_Ptr_ptr)[-1] ==
				_BIG_ALLOCATION_SENTINEL);
 #endif /* _DEBUG */

		// Extra paranoia on aligned allocation/deallocation
		_SCL_SECURE_ALWAYS_VALIDATE(_Ptr_container < _Ptr_user);

 #ifdef _DEBUG
		_SCL_SECURE_ALWAYS_VALIDATE(2 * sizeof(void *)
			<= _Ptr_user - _Ptr_container);

 #else /* _DEBUG */
		_SCL_SECURE_ALWAYS_VALIDATE(sizeof(void *)
			<= _Ptr_user - _Ptr_container);
 #endif /* _DEBUG */

		_SCL_SECURE_ALWAYS_VALIDATE(_Ptr_user - _Ptr_container
			<= _NON_USER_SIZE);

		_Ptr = reinterpret_cast<void *>(_Ptr_container);
		}
 #endif /* defined(_M_IX86) || defined(_M_X64) */

	::operator delete(_Ptr);
	}

上面的代码有很多typedef和宏，还有一些判断和assert，看起来比较复杂，下面是精简之后的实现（只保留比较关键的部分）

void *_Allocate(size_t _Count, size_t _Sz, bool _Try_aligned_allocation = true)
{	// allocate storage for _Count elements of size _Sz
    void *_Ptr = 0;

    // 计算需要内存的字节数
    const size_t _User_size = _Count * _Sz;

// _BIG_ALLOCATION_THRESHOLD 为4096 大于这个大小的内存块需要对齐。
//1、为什么以4096为界？
    if (_Try_aligned_allocation && _BIG_ALLOCATION_THRESHOLD <= _User_size)
    {	// 分配大内存块
        // _BIG_ALLOCATION_ALIGNMENT 大内存对齐 32
        // _NON_USER_SIZE 为 (2 * sizeof(void *) + _BIG_ALLOCATION_ALIGNMENT - 1) 即两个指针大小再加31
        const size_t _Block_size = _NON_USER_SIZE + _User_size;
        // 这里将地址转换成整型是为了进行位运算，uintptr_t可能是unsigned int（32位），或者是unsigned long long（64位）
        const uintptr_t _Ptr_container =
            reinterpret_cast<uintptr_t>(::operator new(_Block_size));
        // 获取对齐地址，低5位清零。
        // 2、为什么是按32个字节对齐？
        _Ptr = reinterpret_cast<void *>((_Ptr_container + _NON_USER_SIZE)
            & ~(_BIG_ALLOCATION_ALIGNMENT - 1));
        // 用_NON_USER_SIZE中的位置存放真正的内存块起始地址
        static_cast<uintptr_t *>(_Ptr)[-1] = _Ptr_container;
    }
    else
    {	// 分配一般内存块
        _Ptr = ::operator new(_User_size);
    }
    return (_Ptr);
}

void _Deallocate(void * _Ptr, size_t _Count, size_t _Sz)
{	// deallocate storage for _Count elements of size _Sz
    const size_t _User_size = _Count * _Sz;
    if (_BIG_ALLOCATION_THRESHOLD <= _User_size)
    {	// 释放大内存块
        // 将地址转换为整数类型，以便做减法运算
        const uintptr_t _Ptr_user = reinterpret_cast<uintptr_t>(_Ptr);
        const uintptr_t _Ptr_ptr = _Ptr_user - sizeof(void *);
        // 获取_NON_USER_SIZE中存放的真正的内存块起始地址
        const uintptr_t _Ptr_container =
            *reinterpret_cast<uintptr_t *>(_Ptr_ptr);
        _Ptr = reinterpret_cast<void *>(_Ptr_container);
    }

    // 真正释放内存
    ::operator delete(_Ptr);
}

问题解释

1、

2、

5. 实现三：STLport中的实现

实现一太简单，只是对operator new和operator delete做了简单的封装；实现二也比较简单，微软似乎把内存分配策略实现在底层。如果需要了解比较细腻内存分配策略（内存池），参考STLport中的实现。

STLport中的实现分析参见：《STL源码剖析》—侯捷 2.2.6-2.2.10

6. 实现四：带内存池的实现

6.1 内存池

按照《STL源码剖析》—侯捷 2.2.6-2.2.10的思路，下面自己实现一个内存池，不考虑线程安全。

内存池分配内存，当所需的内存大小大于128时直接调用operator new分配内存，否则从空闲链表中分配，这样就避免太多小内存块造成的内存碎片和管理内存的额外负担造成内存利用率不高的问题。

首先是管理内存池的数据结构，内存池维护16个空闲链表，为了方便管理（分配和回收），各自管理的内存块大小都是8的倍数，分别为8，16，24，…，128。分配内存时，如果大小不是8的倍数，则将需求上调至8的倍数，然后从相应的空闲链表中分配。为了维护空闲链表中的链表结构，空闲块的前sizeof(void *)个字节存储下一个空闲块节点的地址，当空闲块被分配后，空闲块也将从空闲链表中摘除，所以这种做法不影响用户使用，同时不需要额外内存作为节点的next指针。

class MemoryPool
{
private:
    // 小型内存块的大小都是ms_align的倍数
    constexpr static size_t ms_align = 8;
    // 小型内存块大小的上限
    constexpr static size_t ms_maxBytes = 128;
    // 空闲内存块链表的个数
    constexpr static size_t ms_nFreeLists = ms_maxBytes / ms_align;
private:
    // 内存块空闲链表，m_freeLists[0]是大小为8的内存块链表，m_freeLists[1]是大小为16的...
    void *m_freeLists[ms_nFreeLists];
private:
    // 内存池分配块
    void *m_pFreeStart;
    void *m_pFreeEnd;
};

完整的内存池定义和实现：

class MemoryPool
{
public:
    MemoryPool() :m_freeLists{ nullptr }, m_pFreeStart(nullptr), m_pFreeEnd(nullptr), m_heapSize(0) {}
    // 分配内存
    void *allocate(size_t nBytes);
    // 回收内存
    void deallocate(void *ptr, size_t nBytes);
private:
    // 小型内存块的大小都是ms_align的倍数
    constexpr static size_t ms_align = 8;
    // 小型内存块大小的上限
    constexpr static size_t ms_maxBytes = 128;
    // 空闲内存块链表的个数
    constexpr static size_t ms_nFreeLists = ms_maxBytes / ms_align;
private:
    // 内存块空闲链表，m_freeLists[0]是大小为8的内存块链表，m_freeLists[1]是大小为16的...
    void *m_freeLists[ms_nFreeLists];
private:
    // 内存池
    char *m_pFreeStart;
    char *m_pFreeEnd;
    // 本以为m_heapSize没有用，原来为内存池分配内存时可以作为计算增量的因子
    size_t m_heapSize;
private:
    // 将bytes上调至ms_align的倍数
    size_t roundUp(size_t nBytes)
    {
        return (nBytes + ms_align - 1)&~(ms_align - 1);
    }

    // 根据内存块的大小求得应使用第几个freeList，从0开始
    size_t freeListIndex(size_t nBytes)
    {
        return (nBytes + ms_align - 1) / ms_align - 1;
    }
private:
    // 重新为nBytes所属的空闲链表分配空闲节点，nBytes是ms_align的倍数
    void *refill(size_t nBytes);
    // 从内存池获取内存，nBytes是ms_align的倍数
    void *chunkAlloc(size_t nBytes, int &nObjs);
};

void * MemoryPool::allocate(size_t nBytes)
{
    if (nBytes > ms_maxBytes)
    {   // 当内存块较大，直接从系统分配
        return ::operator new(nBytes);
    }

    // 获取nBytes大小所属的空闲链表
    void **pFreeList = m_freeLists + freeListIndex(nBytes);
    void *result = *pFreeList;

    if (result == nullptr)
    {   // 空闲链表中没有空闲块，需要重新分配
        return refill(roundUp(nBytes));
    }

    // 有空闲块，将该空闲块从空闲链表中摘除
    *pFreeList = *reinterpret_cast<void **>(result);

    return result;
}

// 从这个函数可以看出回收内存时并不会将内存交还系统，空闲链表的内存只增不减
// 考虑到分配的内存都是碎片级别的，非极端情况下闲占的内存不会太多，所以不考虑将内存交还系统
void MemoryPool::deallocate(void *ptr, size_t nBytes)
{
    if (nBytes > ms_maxBytes)
    {   // 和allocate对应，直接给系统回收
        ::operator delete(ptr);
    }

    // 获取nBytes大小所属的空闲链表
    void **pFreeList = m_freeLists + freeListIndex(nBytes);

    // 将内存块重新加入空闲链表
    *reinterpret_cast<void **>(ptr) = *pFreeList;
    *pFreeList = ptr;
}

void * MemoryPool::refill(size_t nBytes)
{
    int nObjs = 20;
    // 尝试从内存池获取nObjs个内存块，可能结果小于nObjs个
    void *pChunk = chunkAlloc(nBytes, nObjs);

    // 不止一块，除了第一块之外都加入空闲链表
    if (nObjs > 1)
    {
        // 获取nBytes大小所属的空闲链表
        void **pFreeList = m_freeLists + freeListIndex(nBytes);

        void *pNext = reinterpret_cast<char *>(pChunk) + nBytes;
        // 将第二块连接到表头
        *pFreeList = pNext;
        for (int i = 2;i < nObjs;++i)
        {
            void **pCurrent = reinterpret_cast<void **>(pNext);
            pNext = reinterpret_cast<char *>(pChunk) + nBytes;
            // 将pNext连接到空闲链表
            *pCurrent = pNext;
        }
        // 空闲链表最后一个next指针为空
        *reinterpret_cast<void **>(pNext) = nullptr;
    }
    // 返回第一块给用户
    return pChunk;
}

void * MemoryPool::chunkAlloc(size_t nBytes, int &nObjs)
{
    size_t nNeedBytes = nBytes * nObjs;
    size_t nFreeBytes = m_pFreeEnd - m_pFreeStart;
    void *result;
    if (nNeedBytes <= nFreeBytes)
    {   // 内存池中还有足够的内存
        result = m_pFreeStart;
        m_pFreeStart += nNeedBytes;
    }
    else if (nBytes <= nFreeBytes)
    {   // 内存池内存还够一个或以上的内存块
        nObjs = nFreeBytes / nBytes;
        result = m_pFreeStart;
        m_pFreeStart += nBytes * nObjs;
    }
    else
    {   // 内存池内存连一个内存块也不够了
        if (nFreeBytes > 0)
        {   // 内存池还有零头，将零头加入合适的空闲链表
            // 从这里可以看出将内存块的上调至ms_align的倍数这个设计真是太精巧了
            void **pFreeList = m_freeLists + freeListIndex(nFreeBytes);
            *reinterpret_cast<void **>(m_pFreeStart) = *pFreeList;
            *pFreeList = *pFreeList;
            // 当下面operator new内存分配失败时，m_pFreeStart状态保证合法
            m_pFreeStart += nFreeBytes;
        }

        // 向系统申请的内存大小，2倍所需再加上附加增量
        // STLport中是这样计算的，可能这样会比较高效
        size_t nBytesToGet = 2 * nNeedBytes + roundUp(m_heapSize >> 4);
        m_pFreeStart = reinterpret_cast<char *>(operator new(nBytesToGet));
        m_heapSize += nBytesToGet;
        m_pFreeEnd = m_pFreeStart + nBytesToGet;
        // 递归调用，非常可以的做法
        return chunkAlloc(nBytes, nObjs);
    }
    return result;
}

6.2 带内存池的allocator实现

这里和实现一相同的部分省略，仅给出分配和释放内存的几个操作：

template<class T>
class allocator
{
private:
    static MemoryPool pool;
public:
…… // 同实现一
    pointer allocate(size_type count, allocator<void>::const_pointer hint = nullptr)
    {
        return static_cast<pointer>(pool.allocate(count * sizeof(value_type)));
    }

    // 释放内存
    void deallocate(pointer ptr, size_type count)
    {
        pool.deallocate(ptr, count);
    }
…… // 同实现一
};