简单的可变大小内存块分配器

EgorYusov

4.82/5 (10投票s)

2017 年 4 月 3 日

CPOL

5分钟阅读

21975

本文介绍了一个轻量级可变大小内存块分配器的实现，该分配器用于 Diligent Engine 2.0。

免责声明：本文是从 Diligent Engine 网站上的原创文章转载的。

引言

本文描述了一个轻量级类，它负责管理空闲内存块以满足可变大小的分配请求。该管理器有一个非常具体的目标，即不是替换或超越标准的new/delete。相反，该类由 Diligent Engine 的描述符堆管理系统使用，以处理描述符堆分配（该系统已在这篇博文中进行了详细讨论）。描述符堆是由Direct3D12系统分配的一块内存。Direct3D12仅提供原始空间的虚拟地址，应用程序负责管理该空间。该类在需要管理非结构化空间的类似场景中可能有用。例如，它可以作为固定大小块分配器的基础分配器。

该类跟踪空闲内存块，并通过查找提供足够空间的最小块来促进分配。释放的内存块会被合并到现有的空闲块中。对于分配和释放操作，运行时的复杂度与块的数量成对数关系。

分配器描述

分配管理器仅跟踪空闲块，而不记录分配大小。该类使用两个有序映射来促进操作。第一个映射按偏移量对块进行排序。第二个多重映射按大小对块进行排序。两个映射的元素相互引用，这使得高效的块插入、删除和合并成为可能。请注意，由于我们需要有序访问，所有对映射的操作（搜索/插入/删除）都需要与块数量成对数的时间。两个映射定义如下：

typedef size_t OffsetType;
struct FreeBlockInfo;
// Type of the map that keeps memory blocks sorted by their offsets
using TFreeBlocksByOffsetMap = std::map<OffsetType, FreeBlockInfo>;

// Type of the map that keeps memory blocks sorted by their sizes
using TFreeBlocksBySizeMap = std::multimap<OffsetType, TFreeBlocksByOffsetMap::iterator>;

struct FreeBlockInfo
{
    // Block size (no reserved space for the size of allocation)
    OffsetType Size;

    // Iterator referencing this block in the multimap sorted by the block size
    TFreeBlocksBySizeMap::iterator OrderBySizeIt;

    FreeBlockInfo(OffsetType _Size) : Size(_Size){}
};

TFreeBlocksByOffsetMap m_FreeBlocksByOffset;
TFreeBlocksBySizeMap m_FreeBlocksBySize;

请注意，第二个映射是多重映射，因为可能存在多个相同大小的块。相反，在特定偏移量上只能有一个块。下图中给出了四块空闲块的示例。

请注意，FreeBlockInfo结构中的Size成员是多余的，因为它等于OrderBySizeIt->first。然而，它使得实现更具可读性和清晰性。

分配

要分配新块，我们使用第二个映射来查找足够大以包含所请求大小的最小块。multimap::lower_bound标准函数正是我们所需要的。

auto SmallestBlockItIt = m_FreeBlocksBySize.lower_bound(Size);
if(SmallestBlockItIt == m_FreeBlocksBySize.end())
    return InvalidOffset;
auto SmallestBlockIt = SmallestBlockItIt->second;

此操作需要与容器大小成对数的时间。请求分配开始的偏移量将由该块的开头给出。

auto Offset = SmallestBlockIt->first;

新空闲块的偏移量将通过将请求的大小添加到原始偏移量来给出，而新大小将通过从原始块大小中减去请求的内存大小来给出。

auto NewOffset = Offset + Size;
auto NewSize = SmallestBlockIt->second.Size - Size;

最后一步是更新映射。由于块大小和偏移量都已更改，我们必须先从容器中移除原始元素，如果新块不为空，则将新元素插入到映射中。

m_FreeBlocksBySize.erase(SmallestBlockItIt);
m_FreeBlocksByOffset.erase(SmallestBlockIt);
if (NewSize > 0)
     AddNewBlock(NewOffset, NewSize);

AddNewBlock()是一个辅助函数，它在指定偏移量处插入给定大小的新块。

void AddNewBlock(OffsetType Offset, OffsetType Size)
{
    auto NewBlockIt = m_FreeBlocksByOffset.emplace(Offset, Size);
    auto OrderIt = m_FreeBlocksBySize.emplace(Size, NewBlockIt.first);
    NewBlockIt.first->second.OrderBySizeIt = OrderIt;
}

插入和删除有序映射中的元素都需要对数时间，因此总方法复杂度为 O(log n)。

例如，如果我们请求一个大小为 20 字节的块，系统将返回偏移量 32，并且分配后的块结构将如下图所示。

完整的函数源代码如下。

OffsetType Allocate(OffsetType Size)
{
    if(m_FreeSize < Size)
        return InvalidOffset;

    // Get the first block that is large enough to encompass Size bytes
    auto SmallestBlockItIt = m_FreeBlocksBySize.lower_bound(Size);
    if(SmallestBlockItIt == m_FreeBlocksBySize.end())
        return InvalidOffset;

    auto SmallestBlockIt = SmallestBlockItIt->second;
    auto Offset = SmallestBlockIt->first;
    auto NewOffset = Offset + Size;
    auto NewSize = SmallestBlockIt->second.Size - Size;
    m_FreeBlocksBySize.erase(SmallestBlockItIt);
    m_FreeBlocksByOffset.erase(SmallestBlockIt);
    if (NewSize > 0)
        AddNewBlock(NewOffset, NewSize);

    m_FreeSize -= Size;
    return Offset;
}

释放

释放稍微复杂一些，因为有更多情况需要处理。第一步是确定新块应该插入在哪里。为此，我们可以使用第一个按偏移量对内存块进行排序的映射。由于我们知道要释放块的偏移量，我们可以使用map::upper_bound来查找紧跟在其后面的块。使用获得的迭代器，我们可以找到前面的空闲块。

// Find the first element whose offset is greater than the specified offset
auto NextBlockIt = m_FreeBlocksByOffset.upper_bound(Offset);
auto PrevBlockIt = NextBlockIt;
if(PrevBlockIt != m_FreeBlocksByOffset.begin())
    --PrevBlockIt;
else
    PrevBlockIt = m_FreeBlocksByOffset.end();

接下来，我们需要处理四种可能的情况。

正在释放的块不与任何其他空闲块相邻。在这种情况下，我们只需将该块添加为新的空闲块。
该块与前面的块相邻。这两个块需要合并。
该块与后面的块相邻。这两个块需要合并。
该块同时与前面的和后面的块相邻。这三个块需要合并。

完整的函数列表如下。

void Free(OffsetType Offset, OffsetType Size)
{
    // Find the first element whose offset is greater than the specified offset
    auto NextBlockIt = m_FreeBlocksByOffset.upper_bound(Offset);
    auto PrevBlockIt = NextBlockIt;
    if(PrevBlockIt != m_FreeBlocksByOffset.begin())
        --PrevBlockIt;
    else
        PrevBlockIt = m_FreeBlocksByOffset.end();

    OffsetType NewSize, NewOffset;
    if(PrevBlockIt != m_FreeBlocksByOffset.end() && 
          Offset == PrevBlockIt->first + PrevBlockIt->second.Size)
    {
        // PrevBlock.Offset           Offset
        // |                          |
        // |<-----PrevBlock.Size----->|<------Size-------->|
        //
        NewSize = PrevBlockIt->second.Size + Size;
        NewOffset = PrevBlockIt->first;

        if (NextBlockIt != m_FreeBlocksByOffset.end() && Offset + Size == NextBlockIt->first)
        {
            // PrevBlock.Offset           Offset               NextBlock.Offset
            // |                          |                    |
            // |<-----PrevBlock.Size----->|<------Size-------->|<-----NextBlock.Size----->|
            //
            NewSize += NextBlockIt->second.Size;
            m_FreeBlocksBySize.erase(PrevBlockIt->second.OrderBySizeIt);
            m_FreeBlocksBySize.erase(NextBlockIt->second.OrderBySizeIt);
            // Delete the range of two blocks
            ++NextBlockIt;
            m_FreeBlocks.erase(PrevBlockIt, NextBlockIt);
        }
        else
        {
            // PrevBlock.Offset           Offset                       NextBlock.Offset
            // |                          |                            |
            // |<-----PrevBlock.Size----->|<------Size-------->| ~ ~ ~ |<-----NextBlock.Size----->|
            //
            m_FreeBlocksBySize.erase(PrevBlockIt->second.OrderBySizeIt);
            m_FreeBlocksByOffset.erase(PrevBlockIt);
        }
    }
    else if (NextBlockIt != m_FreeBlocksByOffset.end() && Offset + Size == NextBlockIt->first)
    {
        // PrevBlock.Offset                   Offset               NextBlock.Offset
        // |                                  |                    |
        // |<-----PrevBlock.Size----->| ~ ~ ~ |<------Size-------->|<-----NextBlock.Size----->|
        //
        NewSize = Size + NextBlockIt->second.Size;
        NewOffset = Offset;
        m_FreeBlocksBySize.erase(NextBlockIt->second.OrderBySizeIt);
        m_FreeBlocksByOffset.erase(NextBlockIt);
    }
    else
    {
        // PrevBlock.Offset                   Offset                       NextBlock.Offset
        // |                                  |                            |
        // |<-----PrevBlock.Size----->| ~ ~ ~ |<------Size-------->| ~ ~ ~ |<-----NextBlock.Size----->|
        //
        NewSize = Size;
        NewOffset = Offset;
    }

    AddNewBlock(NewOffset, NewSize);

    m_FreeSize += Size;
}

释放例程的运行时间复杂度与空闲块的数量成对数关系。下图显示了一个释放偏移量为 56、大小为 8 的块的示例，该操作导致与前面和后面的空闲块合并。

可变大小的 GPU 分配管理器

在管理 GPU 内存时，只有在 GPU 不再访问内存时才能执行释放。对基本分配器的简单扩展允许将块释放推迟到安全执行时。该类提供了新的Free()方法，该方法将帧号与块偏移量和大小一起传递。该方法将块属性添加到删除队列中，并在每帧结束时一次性释放所有过时的分配。

class VariableSizeGPUAllocationsManager : public VariableSizeAllocationsManager
{
private:
     struct FreedAllocationInfo
     {
        OffsetType Offset;
        OffsetType Size;
        Uint64 FrameNumber;
    };

public:
    VariableSizeGPUAllocationsManager(OffsetType MaxSize, IMemoryAllocator &Allocator);
    ~VariableSizeGPUAllocationsManager();
    VariableSizeGPUAllocationsManager(VariableSizeGPUAllocationsManager&& rhs);

    VariableSizeGPUAllocationsManager& operator = (VariableSizeGPUAllocationsManager&& rhs) = default;
    VariableSizeGPUAllocationsManager(const VariableSizeGPUAllocationsManager&) = delete;
    VariableSizeGPUAllocationsManager& operator = (const VariableSizeGPUAllocationsManager&) = delete;

    void Free(OffsetType Offset, OffsetType Size, Uint64 FrameNumber)
    {
        // Do not release the block immediately, but add
        // it to the queue instead
        m_StaleAllocations.emplace_back(Offset, Size, FrameNumber);
    }

    void ReleaseCompletedFrames(Uint64 NumCompletedFrames)
    {
        // Free all allocations from the beginning of the queue that belong to completed frames
        while(!m_StaleAllocations.empty() && m_StaleAllocations.front().FrameNumber < NumCompletedFrames)
        {
            auto &OldestAllocation = m_StaleAllocations.front();
            VariableSizeAllocationsManager::Free(OldestAllocation.Offset, OldestAllocation.Size);
            m_StaleAllocations.pop_front();
        }
    }

 private:
    std::deque< FreedAllocationInfo > m_StaleAllocations;
 };

讨论

该实现可以轻松扩展，以便释放例程不需要分配大小。为此，需要扩展分配的块以保存大小。该分配器提供了很大的灵活性，因为它可以处理任何（可能非常大）大小的分配。其运行时间复杂度与空闲块的数量成对数关系，而固定大小块分配器的复杂度是恒定的。通过该类进行的绝大多数分配都发生在初始化时，因此它不在引擎的热路径中使用，性能是可以接受的。