DirectX 高级中级开发 Windows C++

管理 Direct3D12 中的描述符堆

EgorYusov

5.00/5 (3投票s)

2017年4月6日

CPOL

15分钟阅读

17794

127

描述符和描述符堆是 Direct3D12 中引入的新资源绑定范例的关键组件。本文介绍了一种高效的管理描述符堆的系统。

免责声明：本文是转载自 Diligent Engine 网站上此页的原创内容。

背景

本文并非 D3D12 描述符堆的入门介绍。尽管我们会简要描述描述符堆是什么，但假设读者已理解基本的 D3D12 概念。下面介绍的系统使用了简单的可变大小内存块分配器，并与本文博文中提出的资源绑定模型相关。

引言

资源描述符和描述符堆是 Direct3D12 中引入的新资源绑定模型中的关键概念。描述符是一小块数据，以 GPU 特定的不透明格式完整地描述 GPU 对象。描述符堆本质上是描述符的数组。每个管道状态都包含一个根签名，该签名定义了着色器寄存器如何映射到绑定描述符堆中的描述符。资源绑定是一个两阶段过程：首先，着色器寄存器根据根签名定义映射到描述符堆中的描述符。然后，描述符（可能是 SRV、UAV、CBV 或 Sampler）引用 GPU 内存中的资源。下图说明了 D3D12 资源绑定模型的简化视图。

D3D12 中有四种描述符堆类型

常量缓冲区/着色器资源/无序访问视图（D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV）
采样器（D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER）
渲染目标视图（D3D12_DESCRIPTOR_HEAP_TYPE_RTV）
深度模板视图（D3D12_DESCRIPTOR_HEAP_TYPE_DSV）

为了让 GPU 能够访问堆中的描述符，该堆必须是着色器可见的。只有前两种堆类型（CBV_SRV_UAV 和 SAMPLER）可以是着色器可见的。RTV 和 DSV 堆仅对 CPU 可见。仅 CPU 堆的大小仅受可用 CPU 内存的限制。着色器可见堆的大小有更严格的限制。虽然 CBV_SRV_UAV 堆可以容纳多达 1,000,000 个或更多描述符，但着色器可见堆中采样器的最大数量仅为 2048 个（请参阅 MSDN 上的 D3D12 硬件层级）。因此，并非所有描述符句柄都可以存储在着色器可见的描述符堆中，D3D12 应用程序有责任确保所有渲染所需的描述符句柄都在 GPU 可见堆中。本文介绍了 Diligent Engine 2.0 中实现的描述符堆管理系统。

概述

Diligent Engine 中的描述符堆管理系统主要由五个类组成

DescriptorHeapAllocation 是一个辅助类，表示描述符堆分配，它只是描述符的一个范围
DescriptorHeapAllocationManager 是主要工作类，它使用可变大小 GPU 分配管理器来管理 D3D12 描述符堆中的分配
CPUDescriptorHeap 实现仅 CPU 可见的描述符堆，用作资源视图描述符句柄的存储
GPUDescriptorHeap 实现着色器可见的描述符堆，其中包含 GPU 命令使用的描述符句柄
DynamicSuballocationsManager 负责分配仅在当前帧使用的短期动态描述符句柄

下面将详细介绍每个类及其交互。

描述符堆分配

DescriptorHeapAllocation 是 Diligent Engine 描述符堆管理系统使用的第一个类，表示一个描述符堆分配。它可以初始化为单个描述符，或初始化为指定堆中连续的描述符范围。

请注意，描述符堆分配仅引用堆中的一个范围。它包含 CPU 虚拟地址空间中的第一个 CPU 句柄，如果堆是着色器可见的，则包含 GPU 虚拟地址空间中的第一个 GPU 句柄。该类禁止复制，仅允许通过移动语义转移所有权。该类的定义如下所示。

class DescriptorHeapAllocation
{
public:
    // Creates null allocation
    DescriptorHeapAllocation();

    // Initializes non-null allocation
    DescriptorHeapAllocation( IDescriptorAllocator *pAllocator,
                              ID3D12DescriptorHeap *pHeap,
                              D3D12_CPU_DESCRIPTOR_HANDLE CpuHandle,
                              D3D12_GPU_DESCRIPTOR_HANDLE GpuHandle,
                              Uint32 NHandles,
                              Uint16 AllocationManagerId );

    // Move constructor (copy is not allowed)
    DescriptorHeapAllocation(DescriptorHeapAllocation &&Allocation);

    // Move assignment (assignment is not allowed)
    DescriptorHeapAllocation& operator = (DescriptorHeapAllocation &&Allocation);

    // Destructor automatically releases this allocation through the allocator
    ~DescriptorHeapAllocation()
    {
        if(!IsNull() && m_pAllocator)
            m_pAllocator->Free(std::move(*this));
    }

    // Returns CPU descriptor handle at the specified offset
    D3D12_CPU_DESCRIPTOR_HANDLE GetCpuHandle(Uint32 Offset = 0) const
    {
        D3D12_CPU_DESCRIPTOR_HANDLE CPUHandle = m_FirstCpuHandle;
        if (Offset != 0)
            CPUHandle.ptr += m_DescriptorSize * Offset;
        return CPUHandle;
    }

    // Returns GPU descriptor handle at the specified offset
    D3D12_GPU_DESCRIPTOR_HANDLE GetGpuHandle(Uint32 Offset = 0) const
    {
        D3D12_GPU_DESCRIPTOR_HANDLE GPUHandle = m_FirstGpuHandle;
        if (Offset != 0)
            GPUHandle.ptr += m_DescriptorSize * Offset;
        return GPUHandle;
    }

    // Returns pointer to the descriptor heap that contains this allocation
    ID3D12DescriptorHeap *GetDescriptorHeap(){return m_pDescriptorHeap;}

    size_t GetNumHandles()const{return m_NumHandles;}

    bool IsNull() const { return m_FirstCpuHandle.ptr == 0; }
    bool IsShaderVisible() const { return m_FirstGpuHandle.ptr != 0; }
    size_t GetAllocationManagerId(){return m_AllocationManagerId;}
    UINT GetDescriptorSize()const{return m_DescriptorSize;}

private:
    // No copies, only moves are allowed
    DescriptorHeapAllocation(const DescriptorHeapAllocation&) = delete;
    DescriptorHeapAllocation& operator= (const DescriptorHeapAllocation&) = delete;

    // First CPU descriptor handle in this allocation
    D3D12_CPU_DESCRIPTOR_HANDLE m_FirstCpuHandle = {0};
   
    // First GPU descriptor handle in this allocation
    D3D12_GPU_DESCRIPTOR_HANDLE m_FirstGpuHandle = {0};

    // Pointer to the descriptor heap allocator that created this allocation
    IDescriptorAllocator* m_pAllocator = nullptr;

    // Pointer to the D3D12 descriptor heap that contains descriptors in this allocation
    ID3D12DescriptorHeap* m_pDescriptorHeap = nullptr;
   
    // Number of descriptors in the allocation
    Uint32 m_NumHandles = 0;

    // Allocation manager ID
    Uint16 m_AllocationManagerId = static_cast<Uint16>(-1);
   
    // Descriptor size
    Uint16 m_DescriptorSize = 0;
};

m_AllocationManagerId 字段需要一些解释。正如我们稍后将讨论的，一个描述符堆对象可能包含多个分配管理器。此字段用于标识描述符堆中用于创建此分配的管理器。

描述符堆分配管理器

构成描述符堆管理系统的第二个类是 DescriptorHeapAllocationManager。该类使用可变大小 GPU 分配管理器来处理描述符堆中的分配。

该类创建的每次分配都由 DescriptorHeapAllocation 类的一个实例表示。空闲描述符列表由 m_FreeBlocksManager 成员管理。类声明如下所示。

class DescriptorHeapAllocationManager
{
public:
    // Creates a new D3D12 descriptor heap
    DescriptorHeapAllocationManager(IMemoryAllocator &Allocator,
                                    RenderDeviceD3D12Impl *pDeviceD3D12Impl,
                                    IDescriptorAllocator *pParentAllocator,
                                    size_t ThisManagerId,
                                    const D3D12_DESCRIPTOR_HEAP_DESC &HeapDesc);

    // Uses subrange of descriptors in the existing D3D12 descriptor heap
    // that starts at offset FirstDescriptor and uses NumDescriptors descriptors
    DescriptorHeapAllocationManager(IMemoryAllocator &Allocator,
                                    RenderDeviceD3D12Impl *pDeviceD3D12Impl,
                                    IDescriptorAllocator *pParentAllocator,
                                    size_t ThisManagerId,
                                    ID3D12DescriptorHeap *pd3d12DescriptorHeap,
                                    Uint32 FirstDescriptor,
                                    Uint32 NumDescriptors);

    // Move constructor
    DescriptorHeapAllocationManager(DescriptorHeapAllocationManager&& rhs);

    // No copies or move-assignments
    DescriptorHeapAllocationManager& operator = (DescriptorHeapAllocationManager&& rhs) = delete;
    DescriptorHeapAllocationManager(const DescriptorHeapAllocationManager&) = delete;
    DescriptorHeapAllocationManager& operator = (const DescriptorHeapAllocationManager&) = delete;

    ~DescriptorHeapAllocationManager();

    // Allocates Count descriptors
    DescriptorHeapAllocation Allocate( uint32_t Count );
   
    // Releases descriptor heap allocation. Note
    // that the allocation is not released immediately, but
    // added to the release queue in the allocations manager
    void Free(DescriptorHeapAllocation&& Allocation);
   
    // Releases all stale allocation
    void ReleaseStaleAllocations(Uint64 NumCompletedFrames);

    size_t GetNumAvailableDescriptors()const{return m_FreeBlockManager.GetFreeSize();}

private:
    // Allocations manager used to handle descriptor allocations within the heap
    VariableSizeGPUAllocationsManager m_FreeBlockManager;
   
    // Heap description
    D3D12_DESCRIPTOR_HEAP_DESC m_HeapDesc;

    // Strong reference to D3D12 descriptor heap object
    CComPtr<ID3D12DescriptorHeap> m_pd3d12DescriptorHeap;
   
    // First CPU descriptor handle in the available descriptor range
    D3D12_CPU_DESCRIPTOR_HANDLE m_FirstCPUHandle = {0};
   
    // First GPU descriptor handle in the available descriptor range
    D3D12_GPU_DESCRIPTOR_HANDLE m_FirstGPUHandle = {0};

    UINT m_DescriptorSize = 0;

    // Number of descriptors in the allocation.
    // If this manager was initialized as a subrange in the existing heap,
    // this value may be different from m_HeapDesc.NumDescriptors
    Uint32 m_NumDescriptorsInAllocation = 0;

    std::mutex m_AllocationMutex;
    RenderDeviceD3D12Impl *m_pDeviceD3D12Impl = nullptr;
    IDescriptorAllocator *m_pParentAllocator = nullptr;
   
    // External ID assigned to this descriptor allocations manager
    size_t m_ThisManagerId = static_cast<size_t>(-1);
};

该类提供了两个构造函数。第一个构造函数创建一个新的 D3D12 描述符堆并地址化所有可用空间。第二个构造函数使用现有 D3D12 堆中的子范围描述符。这允许多个分配管理器共享同一个 D3D12 描述符堆，这对于 GPU 可见堆至关重要。

分配例程使用 DescriptorHeapAllocationManager::Allocate() 在堆中分配请求数量的描述符，并返回表示该分配的 DescriptorHeapAllocation 对象。

DescriptorHeapAllocation DescriptorHeapAllocationManager::Allocate(uint32_t Count)
{
    std::lock_guard<std::mutex> LockGuard(m_AllocationMutex);

    // Use variable-size GPU allocations manager to allocate the requested number of descriptors
    auto DescriptorHandleOffset = m_FreeBlockManager.Allocate(Count);
    if (DescriptorHandleOffset == VariableSizeGPUAllocationsManager::InvalidOffset)
        return DescriptorHeapAllocation();

    // Compute the first CPU and GPU descriptor handles in the allocation by
    // offseting the first CPU and GPU descriptor handle in the range
    auto CPUHandle = m_FirstCPUHandle;
    CPUHandle.ptr += DescriptorHandleOffset * m_DescriptorSize;

    auto GPUHandle = m_FirstGPUHandle;
    if(m_HeapDesc.Flags & D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE)
        GPUHandle.ptr += DescriptorHandleOffset * m_DescriptorSize;

    return DescriptorHeapAllocation( m_pParentAllocator, m_pd3d12DescriptorHeap, 
                                     CPUHandle, GPUHandle, Count, 
                                     static_cast<Uint16>(m_ThisManagerId) );
}

同样，释放例程接受 DescriptorHeapAllocation 对象，并使用 DescriptorHeapAllocationManager::Free() 来释放分配。请注意，由于 GPU 命令是异步执行的，因此无法立即释放分配。相反，管理器将其添加到队列中，并附带当前帧编号，然后在 GPU 完成帧（通过检查信号的 fence 检测）时稍后释放所有陈旧的分配。

void DescriptorHeapAllocationManager::Free(DescriptorHeapAllocation&& Allocation)
{
    std::lock_guard<std::mutex> LockGuard(m_AllocationMutex);
    auto DescriptorOffset = (Allocation.GetCpuHandle().ptr - m_FirstCPUHandle.ptr) / m_DescriptorSize;
    // Note that the allocation is not released immediately, but added to the 
    // release queue in the allocations manager
    m_FreeBlockManager.Free(DescriptorOffset, Allocation.GetNumHandles(), 
                            m_pDeviceD3D12Impl->GetCurrentFrame());
    // Clear the allocation
    Allocation = DescriptorHeapAllocation();
}

每帧结束时必须调用 ReleaseStaleAllocations() 方法才能真正释放先前帧的所有陈旧分配。

void DescriptorHeapAllocationManager::ReleaseStaleAllocations(Uint64 NumCompletedFrames)
{
    std::lock_guard<std::mutex> LockGuard(m_AllocationMutex);
    m_FreeBlockManager.ReleaseCompletedFrames(NumCompletedFrames);
}

CPU 描述符堆

描述符堆管理系统的下一部分是 CPU 描述符堆。CPU 描述符堆由引擎在创建新资源时用于存储资源视图。由于总共有四种描述符堆类型，因此系统维护四个 CPUDescriptorHeap 实例（堆是渲染设备的一部分）。每个 CPU 描述符堆都维护一个描述符堆分配管理器池以及一个包含未使用描述符的管理器列表。

// Pool of descriptor heap managers
std::vector<DescriptorHeapAllocationManager> m_HeapPool;
// Indices of available descriptor heap managers
std::set<size_t> m_AvailableHeaps;

下图展示了 CPU 描述符堆对象的示例内容。

在分配新描述符时，CPUDescriptorHeap 类会遍历具有可用描述符的管理器列表，并尝试使用每个管理器来处理请求。如果没有可用的管理器，或者没有管理器能够处理请求，则该函数会创建一个新的描述符堆管理器并让它处理请求。分配函数的源代码如下所示。

DescriptorHeapAllocation CPUDescriptorHeap::Allocate( uint32_t Count )
{
    std::lock_guard<std::mutex> LockGuard(m_AllocationMutex);
    DescriptorHeapAllocation Allocation;
    // Go through all descriptor heap managers that have free descriptors
    for (auto AvailableHeapIt = m_AvailableHeaps.begin(); AvailableHeapIt != m_AvailableHeaps.end(); 
         ++AvailableHeapIt)
    {
        // Try to allocate descriptors using the current descriptor heap manager
        Allocation = m_HeapPool[*AvailableHeapIt].Allocate(Count);
        // Remove the manager from the pool if it has no more available descriptors
        if(m_HeapPool[*AvailableHeapIt].GetNumAvailableDescriptors() == 0)
            m_AvailableHeaps.erase(*AvailableHeapIt);

        // Terminate the loop if descriptor was successfully allocated, otherwise
        // go to the next manager
        if(Allocation.GetCpuHandle().ptr != 0)
            break;
    }

    // If there were no available descriptor heap managers or no manager was able
    // to suffice the allocation request, create a new manager
    if(Allocation.GetCpuHandle().ptr == 0)
    {
        // Make sure the heap is large enough to accomodate the requested number of descriptors
        m_HeapDesc.NumDescriptors = std::max(m_HeapDesc.NumDescriptors, static_cast<UINT>(Count));
        // Create a new descriptor heap manager. Note that this constructor creates a new D3D12
        // descriptor heap and references the entire heap. Pool index is used as manager ID
        m_HeapPool.emplace_back( m_MemAllocator, m_pDeviceD3D12Impl, this, 
                                 m_HeapPool.size(), m_HeapDesc );
        auto NewHeapIt = m_AvailableHeaps.insert(m_HeapPool.size()-1);

        // Use the new manager to allocate descriptor handles
        Allocation = m_HeapPool[*NewHeapIt.first].Allocate(Count);
    }

    m_CurrentSize += (Allocation.GetCpuHandle().ptr != 0) ? Count : 0;
    m_MaxHeapSize = std::max(m_MaxHeapSize, m_CurrentSize);

    return Allocation;
}

例如，如果我们请求一个包含五个描述符的新分配，该函数将首先要求管理器 [1] 来处理此请求，但会失败，因为它最多只有两个连续的描述符。然后该函数将要求管理器 [2]，它将能够处理此请求。

之后，如果我们请求分配三个描述符，没有任何管理器能够处理此请求，并且该函数将向池中添加一个新管理器并使用它来处理请求。

释放例程调用适当分配管理器的 Free() 方法。回想一下，该方法是从 DescriptorHeapAllocation 的析构函数中调用的。请注意，该函数使用 GetAllocationManagerId() 来检索创建此分配的管理器索引。

void CPUDescriptorHeap::Free(DescriptorHeapAllocation&& Allocation)
{
    std::lock_guard<std::mutex> LockGuard(m_AllocationMutex);
    auto ManagerId = Allocation.GetAllocationManagerId();
    m_CurrentSize -= static_cast<Uint32>(Allocation.GetNumHandles());
    m_HeapPool[ManagerId].Free(std::move(Allocation));
}

最后，有一个通常的方法必须在帧结束时调用，以便在安全时释放所有陈旧的分配。请注意，正是此方法将管理器返回到可用管理器列表中。仅当描述符已实际释放后，才能安全地执行此操作。

void CPUDescriptorHeap::ReleaseStaleAllocations(Uint64 NumCompletedFrames)
{
    std::lock_guard<std::mutex> LockGuard(m_AllocationMutex);
    for (size_t HeapManagerInd = 0; HeapManagerInd < m_HeapPool.size(); ++HeapManagerInd)
    {
        m_HeapPool[HeapManagerInd].ReleaseStaleAllocations(NumCompletedFrames);
        // Return the manager to the pool of available managers if it has available descriptors
        if(m_HeapPool[HeapManagerInd].GetNumAvailableDescriptors() > 0)
            m_AvailableHeaps.insert(HeapManagerInd);
    }
}

GPU 描述符堆

CPU 描述符堆的主要目的是为资源视图描述符提供存储。为了让 GPU 能够访问描述符，它们必须位于着色器可见的描述符堆中。一次最多只能将一个 SRV_CBV_UAV 和一个 SAMPLER 堆绑定到 GPU。源描述符可能分散在多个仅 CPU 的描述符堆中，但在执行绘制命令之前，必须将它们合并到同一个 SRV_CBV_UAV 或 SAMPLER 堆中。因此，GPUDescriptorHeap 对象仅包含一个 D3D12 描述符堆。该空间分为两部分：第一部分用于存放不常更改的描述符句柄（对应于静态和可变变量）。第二部分用于存放动态描述符句柄，即仅在当前帧中存在的临时句柄。虽然第一部分在所有线程之间共享，但将第二部分组织成相同的方式会非常低效。动态描述符句柄分配可能是一个非常频繁的操作，如果多个线程同时记录命令，从同一个池分配动态描述符句柄将成为瓶颈。为了避免这个问题，动态描述符句柄分配是一个两阶段过程。在第一阶段，每个记录命令的命令上下文从 GPU 描述符堆的共享动态部分分配一个描述符块。此操作需要对 GPU 堆的独占访问，但发生的频率不高。第二阶段是从该块进行子分配。这一部分是无锁的，可以由每个线程并行完成。然后，GPU 堆的结构可以描绘如下。

有两种类实现了上述策略。GPUDescriptorHeap 管理堆的这两部分，而 DynamicSuballocationsManager 处理动态部分内的子分配。如前所述，GPUDescriptorHeap 类包含两个描述符堆分配管理器，一个用于静态分配，一个用于动态分配。

DescriptorHeapAllocationManager m_HeapAllocationManager;
DescriptorHeapAllocationManager m_DynamicAllocationsManager;

请注意，这两个分配管理器都初始化为从同一个 D3D12 描述符堆进行子分配。此外，第一个管理器被分配 id 0，第二个管理器被分配 id 1。该类提供了两个方法来从堆的静态和动态部分进行分配。

DescriptorHeapAllocation GPUDescriptorHeap::Allocate(uint32_t Count)
{
    std::lock_guard<std::mutex> LockGuard(m_AllocMutex);
    DescriptorHeapAllocation Allocation = m_HeapAllocationManager.Allocate(Count);
    return Allocation;
}

DescriptorHeapAllocation GPUDescriptorHeap::AllocateDynamic(uint32_t Count)
{
    std::lock_guard<std::mutex> LockGuard(m_DynAllocMutex);
    DescriptorHeapAllocation Allocation = m_DynamicAllocationsManager.Allocate(Count);
    return Allocation;
}

只有一个 Free() 方法，因为可以使用管理器 id 来确定分配属于静态部分还是动态部分。

void GPUDescriptorHeap::Free(DescriptorHeapAllocation&& Allocation)
{
    auto MgrId = Allocation.GetAllocationManagerId();
    if(MgrId == 0)
    {
        std::lock_guard<std::mutex> LockGuard(m_AllocMutex);
        m_HeapAllocationManager.Free(std::move(Allocation));
    }
    else
    {
        std::lock_guard<std::mutex> LockGuard(m_DynAllocMutex);
        m_DynamicAllocationsManager.Free(std::move(Allocation));
    }
}

请注意，所有方法都会锁定互斥锁以获取对分配管理器的独占访问。AllocateDynamic() 方法仅由 DynamicSuballocationsManager 类用于分配堆块以进行子分配。该类维护一个从主 GPU 描述符堆分配的块列表以及当前块内的偏移量。

std::vector<DescriptorHeapAllocation> m_Suballocations;
Uint32 m_CurrentSuballocationOffset = 0;

在每一帧中，分配以线性方式进行。分配方法首先检查当前块是否有足够的空间来满足请求的描述符数量。如果没有，该方法会从主 GPU 描述符堆请求一个新的块。然后从新块中进行子分配。

DescriptorHeapAllocation DynamicSuballocationsManager::Allocate(Uint32 Count)
{
    // Check if there are no chunks or the last chunk does not have enough space
    if( m_Suballocations.empty() ||
        m_CurrentSuballocationOffset + Count > m_Suballocations.back().GetNumHandles() )
    {
        // Request new chunk from the GPU descriptor heap
        auto SuballocationSize = std::max(m_DynamicChunkSize, Count);
        auto NewDynamicSubAllocation = m_ParentGPUHeap.AllocateDynamic(SuballocationSize);
        m_Suballocations.emplace_back(std::move(NewDynamicSubAllocation));
        m_CurrentSuballocationOffset = 0;
    }

    // Perform suballocation from the last chunk
    auto &CurrentSuballocation = m_Suballocations.back();
   
    auto ManagerId = CurrentSuballocation.GetAllocationManagerId();
    DescriptorHeapAllocation Allocation( 
        this,
        CurrentSuballocation.GetDescriptorHeap(),
        CurrentSuballocation.GetCpuHandle(m_CurrentSuballocationOffset),
        CurrentSuballocation.GetGpuHandle(m_CurrentSuballocationOffset),
        Count,
        static_cast<Uint16>(ManagerId) );
    m_CurrentSuballocationOffset += Count;

    return Allocation;
}

请注意，此方法是无锁的，因为每个上下文都有自己的子分配管理器。线程仅在从主 GPU 描述符堆请求新块时可能被阻塞，但这是一种不常见的情况。

子分配不会单独释放，因此 DynamicSuballocationsManager::Free() 方法什么也不做。相反，所有分配都会在上下文的命令列表被渲染设备记录和执行时被丢弃。

void DynamicSuballocationsManager::DiscardAllocations(Uint64 FrameNumber)
{
    m_Suballocations.clear();
}

清除向量会导致所有 Descriptor Heap Allocation 对象被销毁，这会调用它们的析构函数。析构函数调用父 GPU 描述符堆的 GPUDescriptorHeap::Free() 方法，该方法将分配添加到释放队列。分配将在几帧后实际释放。

全局概览

现在我们已经介绍了每个单独的组件，我们可以描述它们如何相互以及与系统的其余部分交互。有四种共享的仅 CPU 描述符堆（CBV_SRV_UAV、SAMPLER、RTV 和 DSV），由 CPUDescriptorHeap 类实现，以及两种着色器可见（GPU）描述符堆（CBV_SRV_UAV 和 SAMPLER），由 GPUDescriptorHeap 类实现。用于记录命令的每个设备上下文都包含两个动态子分配管理器（对应于两种着色器可见描述符堆类型），由 DynamicSuballocationsManager 类表示。CPU 描述符堆在创建新的资源视图时使用。GPU 描述符堆由着色器资源绑定系统用于分配着色器可见描述符的存储。它们也用于分配动态描述符。

使用场景

现在让我们讨论几个涉及描述符堆的场景。

创建资源视图

让我们以创建纹理的着色器资源视图（SRV）为例，看看资源视图是如何创建的。过程如下。

从 CBV_SRV_UAV 仅 CPU 描述符堆请求一个包含单个描述符句柄的分配。描述符堆分配如上所述，经过以下步骤：
- CPUDescriptorHeap::Allocate() 方法获取对 CPU 描述符堆对象的独占访问。
- 该方法遍历具有可用描述符句柄的描述符堆管理器，并请求单个描述符分配。
  - 由于只请求一个描述符句柄，第一个管理器将能够处理此请求。
- 如果没有可用的管理器，则会创建一个新的管理器（以及一个新的 D3D12 描述符堆）来处理请求。
D3D12 渲染设备用于在分配的描述符中初始化着色器资源视图（参见 MSDN 上的 ID3D12Device::CreateShaderResourceView）。
描述符堆分配对象作为资源视图对象的一部分保留，并在资源视图对象被释放时销毁。此时：
- Descriptor Heap Allocation 对象的析构函数调用 CPUDescriptorHeap::Free()，该函数锁定堆并调用创建分配的分配管理器的 DescriptorHeapAllocationManager::Free() 方法。
- 管理器将分配属性（偏移量和大小）以及帧号插入删除队列。
- 几帧后，当帧完成 fence 被信号时，分配实际上由 CPUDescriptorHeap::ReleaseStaleAllocations() 方法释放。

创建所有类型的纹理视图（SRV、RTV、DSV 和 UAV）以及所有类型的缓冲区视图都以相同的方式完成。

分配动态描述符

现在让我们回顾一下动态描述符是如何分配的。

需要动态描述符的上下文使用其两个动态子分配管理器之一（CBV_SRV_UAV 或 SAMPLER）来请求所需类型的描述符句柄。
- 子分配管理器检查最后一个块是否包含足够的空间来满足分配请求。在大多数情况下，情况就是如此，并且描述符句柄将从该块中进行子分配。
- 如果没有足够的空间，子分配管理器会请求主 GPU 描述符堆分配一个新的描述符句柄块。然后从新块中进行子分配。
帧结束时，子分配管理器会处理所有块，这些块会返回到 GPU 描述符堆。
- GPU 描述符堆将所有块连同帧号一起插入释放队列。
- 几帧后，当帧完成 fence 被信号时，这些块会被实际释放，空间可用于新的分配。

着色器资源绑定

Diligent Engine 使用着色器资源绑定模型，该模型包含三种类型的着色器资源，根据更改频率（静态、可变和动态）以及着色器资源绑定对象。当创建新的着色器资源绑定对象时，它会为可变和静态资源在 GPU 描述符堆中分配空间。该分配由着色器资源绑定对象保留，并在拥有对象被销毁时释放。此主题将在单独的帖子中详细讨论。

多线程和 GPU 安全性问题

描述符堆管理系统在多线程环境中是正确、安全且高效的。所有三种类型的分配（CPU 描述符、静态/可变 GPU 描述符和动态 GPU 描述符）都通过线程安全路径进行。CPU 和静态/可变描述符分配函数（CPUDescriptorHeap::Allocate()、GPUDescriptorHeap::Allocate()）获取对描述符堆对象的独占访问，并可能阻塞其他线程。但是，描述符分配速度很快，并且仅占资源创建相关工作的一小部分，因此这不是问题。动态描述符堆分配（DynamicSuballocationsManager::Allocate()）是无线程阻塞的，可以由多个线程并行调用，而不会产生性能成本（同一上下文不应由不同线程同时使用）。唯一的阻塞函数是 GPUDescriptorHeap::AllocateDynamic()，但它仅偶尔调用。

释放更复杂，因为除了 CPU 端安全之外，系统还必须确保描述符不被 GPU 使用。CPU 端安全是通过使用互斥锁保护释放方法（CPUDescriptorHeap::Free() 和 GPUDescriptorHeap::Free()）来实现的。GPU 端安全通过在分配被销毁时记录命令列表编号来保证。对于 CPU 和静态/可变 GPU 描述符，哪个线程释放分配无关紧要。只要没有更多引用，分配就永远不会被用于任何新的 GPU 命令，但它可能被 GPU 挂起执行的命令引用。因此，在分配被释放时，删除线程会将其与当前命令列表编号一起添加到删除队列中。删除队列在每个帧结束时由渲染设备处理一次。设备知道 GPU 已完成多少命令列表，并可以释放所有被已完成命令引用的分配。

对于动态描述符，释放发生在上下文的命令列表被关闭和执行时。哪个线程记录了列表无关紧要。只要它已发送到命令队列执行（从任何线程），所有动态描述符都已过时，可以被丢弃。因此，上下文将所有块返回到 GPU 描述符堆对象，该对象将其添加到释放队列。对于延迟上下文，这意味着在它被执行之前，所有动态描述符都不能被其他上下文使用。

讨论

在当前实现中，所有线程都使用相同的 CPU 描述符堆对象来分配资源视图描述符句柄。我们没有发现这会成为问题，因为描述符堆分配/释放速度非常快，除非需要创建新的 CPU 描述符堆。但这不应该成为问题，因为可以在初始化时指定描述符堆管理器的大小，以满足应用程序的需求。该系统提供了查询每个堆在应用程序运行时达到的最大大小的方法。

细心的读者可能已经注意到，GPUDescriptorHeap 类使用通用的 DescriptorHeapAllocationManager 来分配相同大小的动态块。块大小可能不同的唯一情况是请求的描述符数量大于默认块大小时。然而，这种情况非常不典型，因此可以使用更高效的固定大小块分配器代替可变大小分配管理器。

Diligent Engine 目前仅支持每种类型的单个 GPU 描述符堆（CBV_SRV_UAV 和 SAMPLER）。虽然第一个堆可以包含大量描述符句柄（1,000,000+），但采样器堆的大小限制为 2048 个描述符，这可能导致堆耗尽。然而，在大多数情况下，着色器中的采样器类型是预先知道的，并且永远不会改变。D3D12 引入了静态采样器的概念来处理这种情况，Diligent Engine 也暴露了这一点。应尽可能使用静态采样器，并且静态采样器的数量不受限制。因此，采样器描述符堆将仅用于存放运行时更改的采样器的描述符句柄，这种情况不太典型。