集合性能 XML .NET C#

B 树排序字典

Gerardo Recinto

4.96/5 (47投票s)

2010 年 7 月 24 日

MIT

8分钟阅读

192143

4871

内存中的 B 树排序字典

下载源代码 - 64.2 KB

引言

B-Tree 是一种平衡的多路树。维基百科的讨论是介绍 B-Tree 数据结构的特性和节点布局的良好材料：http://en.wikipedia.org/wiki/B-tree。

本文讨论了内存中 B-Tree 的实现。尽管 B-Tree 通常用于文件/数据库系统的索引（参见磁盘上的开源 B-Tree @: https://github.com/SharedCode/sop），但在 .NET Framework 或任何语言/平台中实现基于 B-Tree 的集合或字典具有重要的价值。

内存中 B-Tree 的优势

当今典型的内存中排序字典数据结构基于二叉树算法，这与 B-Tree 不同。二叉树的每个节点可以包含一个单独的项，而 B-Tree 可以包含用户定义的数量的项，并且其节点保持平衡。这是一个非常重要的区别。由于每个节点只有一个项，因此在二叉树中存储大量项会生成一个高而窄的节点图，其中包含许多节点。

在相应的 B-Tree 实现中，图形将倾向于更短更宽，节点少得多。这是因为 B-Tree 中的节点通常配置为包含大量项；例如，一个包含 12 个项的 B-Tree 字典将需要一个节点来包含这些项，而一个二叉树将需要 12 个节点，这些节点可能分散在堆（内存）中。将我们的项采样增加到数千、数十万或数百万，我们谈论的是相应的基于 B-Tree 的排序字典可以提供的显著区别和显著优化。一个包含一百万个单项节点的二叉树，与大约八万三千个节点（对于 12 个项的节点设置）的 B-Tree 相比，如果用户指定的每个节点项数比上面提到的要多，那么 B-Tree 的节点数量会更少。

有了这种特性，很容易想象二叉树会占用更多内存，并且会比使用 B-Tree 算法的相应字典产生更多的内存碎片。在生产环境中，我们无法承受系统容易出现内存碎片，因为随着时间的推移，应用程序的性能会下降，并且由于缺乏连续的可分配空间而可能导致内存不足。

实现

在本文及相关的代码实现中，我选择了 C# 和 .NET Framework 作为实现语言和平台。由于 B-Tree 是一种成熟的算法，我将重点讨论高级设计/实现问题，包括接口约定。完整的源代码可供下载。

IBTreeDictionary：扩展泛型 IDictionary

泛型是强制执行编码和编译时数据类型检查的绝佳功能。因此，它是支持此 B-Tree 排序字典实现的一项很棒的功能（注意：也支持非泛型实现）。排序的 Dictionary 实现泛型的 IDictionary 接口，以便提供上述功能。

为了公开 B-Tree 的特定功能，例如允许重复键，从而需要一个显式的项迭代器来提供对项遍历的精细控制，我正在扩展 IDictionary 接口以添加所述功能的相应方法，从而创建了 IBTreeDictionary 接口。该接口如下所示：

public interface IBaseCollection<T> : ICloneable
{
  bool MoveNext();
  bool MovePrevious();
  bool MoveFirst();
  bool MoveLast();
  bool Search(T Value);
}

public interface IBTreeDictionary<TKey, TValue> : 
       System.Collections.Generic.IDictionary<TKey, 
       TValue>, IBaseCollection<TKey>
{
  Sop.Collections.BTree.SortOrderType SortOrder{ get;set; }
  BTreeItem<TKey, TValue> CurrentEntry{ get; }
  TKey CurrentKey{ get; }
  TValue CurrentValue{ get; set; }
  void Remove();
}

为了消耗具有重复键的项，该字典需要一种功能，能够将项指针设置到所需位置，然后遍历一个范围，同时消耗该范围内的项值。这就是为什么需要 Movexxx 方法以及 CurrentKey/CurrentValue 项指针。

此外，B-Tree 提供了一种功能，允许用户以升序或降序遍历项，因此提供了 SortOrder 属性。

管理和搜索功能

以下是精选的 B-Tree 管理和搜索功能，以说明一些行为，这些行为可以表征并让用户了解这种非常有用的 B-Tree 算法的口味实现。

Add 方法

向 B-Tree 添加项需要确定要添加项的适当节点。这是必需的，因为树中的项在插入或删除时会保持排序。遍历树以查找节点可以通过递归或通过循环结构（如 while 循环）完成。后者比递归更可取，因为在这种情况下，它消除了堆栈使用的要求，即完成后，没有隐式的堆栈（弹出）操作，因为最内层函数会展开/返回到主调用函数。

此块说明了一个 while 循环，用于遍历树以查找目标节点。它对树的每个节点使用 BinarySearch 来进一步优化每个节点的搜索。这是可能的，因为节点中的每个项都是排序的，因此 BinarySearch 允许最佳搜索和最小的键比较。

TreeNode CurrentNode = this;
int Index = 0;
while (true)
{
    Index = 0;
    byte NoOfOccupiedSlots = CurrentNode.count;
    if (NoOfOccupiedSlots > 1)
    {
        if (ParentBTree.SlotsComparer != null)
            Index = Array.BinarySearch(CurrentNode.Slots, 0, 
                    NoOfOccupiedSlots, Item, ParentBTree.SlotsComparer);
        else
            Index = Array.BinarySearch(CurrentNode.Slots, 0, 
                                       NoOfOccupiedSlots, Item);
        if (Index < 0)
            Index = ~Index;
    }
    else if (NoOfOccupiedSlots == 1)
    {
        if (ParentBTree.Comparer != null)
        {
            if (ParentBTree.Comparer.Compare(CurrentNode.Slots[0].Key, Item.Key) < 0)
                Index = 1;
        }
        else
        {
            try
            {
                if (System.Collections.Generic.Comparer<TKey>.Default.Compare(
                                 CurrentNode.Slots[0].Key, Item.Key) < 0)
                    Index = 1;
            }
            catch (Exception)
            {
                if (CurrentNode.Slots[0].Key.ToString().CompareTo(Item.Key.ToString()) < 0)
                    Index = 1;
            }
        }
    }
    if (CurrentNode.Children != null)
    // if not an outermost node, let next lower level node do the 'Add'.
        CurrentNode = CurrentNode.Children[Index];
    else
        break;
}

此时，已找到添加项的正确节点。现在，调用实际的 Add 函数。

CurrentNode.Add(ParentBTree, Item, Index);
CurrentNode = null;

这是执行实际项插入到找到的节点的 Add 函数。它保持节点项的排序，并管理节点已满时的节点拆分，以及其他必需的 B-Tree 维护操作，例如已拆分节点的父节点提升等。

void Add(BTreeAlgorithm<TKey, TValue> ParentBTree,
                BTreeItem<TKey, TValue> Item, int Index)
{
    byte NoOfOccupiedSlots = count;
    // Add. check if node is not yet full..
    if (NoOfOccupiedSlots < ParentBTree.SlotLength)
    {
        ShiftSlots(Slots, (byte)Index, (byte)NoOfOccupiedSlots);
        Slots[Index] = Item;
        count++;
        return;
    }
    else
    {
        Slots.CopyTo(ParentBTree.TempSlots, 0);
        byte SlotsHalf = (byte)(ParentBTree.SlotLength >> 1);
        if (Parent != null)
        {
            bool bIsUnBalanced = false;
            int iIsThereVacantSlot = 0;
            if (IsThereVacantSlotInLeft(ParentBTree, ref bIsUnBalanced))
                iIsThereVacantSlot = 1;
            else if (IsThereVacantSlotInRight(ParentBTree, ref bIsUnBalanced))
                iIsThereVacantSlot = 2;
            if (iIsThereVacantSlot > 0)
            {
                // copy temp buffer contents to the actual slots.
                byte b = (byte)(iIsThereVacantSlot == 1 ? 0 : 1);
                CopyArrayElements(ParentBTree.TempSlots, b, 
                                  Slots, 0, ParentBTree.SlotLength);
                if (iIsThereVacantSlot == 1)
                    // Vacant in left, "skud over"
                    // the leftmost node's item to parent and left.
                    DistributeToLeft(ParentBTree, 
                       ParentBTree.TempSlots[ParentBTree.SlotLength]);
                else if (iIsThereVacantSlot == 2)
                    // Vacant in right, move the rightmost
                    // node item into the vacant slot in right.
                    DistributeToRight(ParentBTree, ParentBTree.TempSlots[0]);
                return;
            }
            else if (bIsUnBalanced)
            { // if this branch is unbalanced..
                // _BreakNode
                // Description :
                // -copy the left half of the slots
                // -copy the right half of the slots
                // -zero out the current slot.
                // -copy the middle slot
                // -allocate memory for children node *s
                // -assign the new children nodes.
                TreeNode LeftNode;
                TreeNode RightNode;
                try
                {
                    RightNode = ParentBTree.GetRecycleNode(this);
                    LeftNode = ParentBTree.GetRecycleNode(this);
                    CopyArrayElements(ParentBTree.TempSlots, 0, 
                                      LeftNode.Slots, 0, SlotsHalf);
                    LeftNode.count = SlotsHalf;
                    CopyArrayElements(ParentBTree.TempSlots, 
                       (ushort)(SlotsHalf + 1), RightNode.Slots, 0, SlotsHalf);
                    RightNode.count = SlotsHalf;
                    ResetArray(Slots, null);
                    count = 1;
                    Slots[0] = ParentBTree.TempSlots[SlotsHalf];
                    Children = new TreeNode[ParentBTree.SlotLength + 1];
                    ResetArray(Children, null);
                    Children[(int)Sop.Collections.BTree.ChildNodes.LeftChild] = LeftNode;
                    Children[(int)Sop.Collections.BTree.ChildNodes.RightChild] = RightNode;
                    //SUCCESSFUL!
                    return;
                }
                catch (Exception)
                {
                    Children = null;
                    LeftNode = null;
                    RightNode = null;
                    throw;
                }
            }
            else
            {
                // All slots are occupied in this and other siblings' nodes..
                TreeNode RightNode;
                try
                {
                    // prepare this and the right node sibling
                    // and promote the temporary parent node(pTempSlot).
                    RightNode = ParentBTree.GetRecycleNode(Parent);
                    // zero out the current slot.
                    ResetArray(Slots, null);
                    // copy the left half of the slots to left sibling
                    CopyArrayElements(ParentBTree.TempSlots, 0, Slots, 0, SlotsHalf);
                    count = SlotsHalf;
                    // copy the right half of the slots to right sibling
                    CopyArrayElements(ParentBTree.TempSlots, 
                       (ushort)(SlotsHalf + 1), RightNode.Slots, 0, SlotsHalf);
                    RightNode.count = SlotsHalf;
                    // copy the middle slot to temp parent slot.
                    ParentBTree.TempParent = ParentBTree.TempSlots[SlotsHalf];
                    // assign the new children nodes.
                    ParentBTree.TempParentChildren[
                      (int)Sop.Collections.BTree.ChildNodes.LeftChild] = this;
                    ParentBTree.TempParentChildren[
                      (int)Sop.Collections.BTree.ChildNodes.RightChild] = RightNode;
                    ParentBTree.PromoteParent = (TreeNode)Parent;
                    ParentBTree.PromoteIndexOfNode = GetIndexOfNode(ParentBTree);
                    //TreeNode o = (TreeNode)Parent;
                    //o.Promote(ParentBTree, GetIndexOfNode(ParentBTree));
                    //SUCCESSFUL!
                    return;
                }
                catch (Exception)
                {
                    RightNode = null;
                    throw;
                }
            }
        }
        else
        {
            // _BreakNode
            // Description :
            // -copy the left half of the temp slots
            // -copy the right half of the temp slots
            // -zero out the current slot.
            // -copy the middle of temp slot to 1st elem of current slot
            // -allocate memory for children node *s
            // -assign the new children nodes.
            TreeNode LeftNode;
            TreeNode RightNode;
            try
            {
                RightNode = ParentBTree.GetRecycleNode(this);
                LeftNode = ParentBTree.GetRecycleNode(this);
                CopyArrayElements(ParentBTree.TempSlots, 0, 
                                  LeftNode.Slots, 0, SlotsHalf);
                LeftNode.count = SlotsHalf;
                CopyArrayElements(ParentBTree.TempSlots, 
                   (ushort)(SlotsHalf + 1), RightNode.Slots, 0, SlotsHalf);
                RightNode.count = SlotsHalf;
                ResetArray(Slots, null);
                Slots[0] = ParentBTree.TempSlots[SlotsHalf];
                count = 1;
                Children = new TreeNode[ParentBTree.SlotLength + 1];
                Children[(int)Sop.Collections.BTree.ChildNodes.LeftChild] = LeftNode;
                Children[(int)Sop.Collections.BTree.ChildNodes.RightChild] = RightNode;
                return; // successful
            }
            catch (Exception)
            {
                LeftNode = null;
                RightNode = null;
                RightNode = null;
                throw;
            }
        }
    }
}

删除方法

B-Tree 中有两种 Remove 版本：一种根据键删除项，另一种删除树中当前选定的项。前者将导致 B-Tree 搜索项（给定一个键），并将项指针定位到树中具有匹配键的项。添加块中列出的相同节点确定方法用于查找要删除的节点和项。在将项指针定位到正确的项后，该项实际上会从树中删除。这是说明项删除的代码。

当要删除的项不在最外层节点时，树会将最外层节点的下一个项移到覆盖当前选定项的槽，从而从树中删除该项。

internal protected bool Remove(BTreeAlgorithm<TKey, TValue> ParentBTree)
{
    if (Children != null)
    {
        byte byIndex = ParentBTree.CurrentItem.NodeItemIndex;
        MoveNext(ParentBTree);
        Slots[byIndex] = 
          ParentBTree.CurrentItem.Node.Slots[ParentBTree.CurrentItem.NodeItemIndex];
    }
    return true;
}

然后，它负责管理最外层节点，要么将移动项留下的槽设置为 null，要么通过从左侧或右侧兄弟节点拉取项来维护树的平衡。

internal void FixTheVacatedSlot(BTreeAlgorithm<TKey, TValue> ParentBTree)
{
    sbyte c;
    c = (sbyte)count;
    if (c > 1) // if there are more than 1 items in slot then..
    { //***** We don't fix the children since there are no children at this scenario.
        if (ParentBTree.CurrentItem.NodeItemIndex < c - 1)
            MoveArrayElements(Slots,
                (ushort)(ParentBTree.CurrentItem.NodeItemIndex + 1),
                ParentBTree.CurrentItem.NodeItemIndex,
                (ushort)(c - 1 - ParentBTree.CurrentItem.NodeItemIndex));
        count--;
        Slots[count] = null; // nullify the last slot.
    }
    else
    { // only 1 item in slot
        if (Parent != null)
        {
            byte ucIndex;
            // if there is a pullable item from sibling nodes.
            if (SearchForPullableItem(ParentBTree, out ucIndex))
            {
                if (ucIndex < GetIndexOfNode(ParentBTree))
                    PullFromLeft(ParentBTree); // pull an item from left
                else
                    PullFromRight(ParentBTree); // pull an item from right
            }
            else
            { // Parent has only 2 children nodes..
                if (Parent.Children[0] == this)
                { // this is left node
                    TreeNode RightSibling = GetRightSibling(ParentBTree);
                    Parent.Slots[1] = RightSibling.Slots[0];
                    Parent.count = 2;
                    ParentBTree.AddRecycleNode(RightSibling);
                    RightSibling = null;
                }
                else
                { // this is right node
                    Parent.Slots[1] = Parent.Slots[0];
                    TreeNode LeftSibling = GetLeftSibling(ParentBTree);
                    Parent.Slots[0] = LeftSibling.Slots[0];
                    Parent.count = 2;
                    ParentBTree.AddRecycleNode(LeftSibling);
                    LeftSibling = null;
                }
                // nullify Parent's children will cause this
                // tree node instance to be garbage collected
                // as this is child of parent!
                Parent.Children[0] = null;
                Parent.Children[1] = null;
                Parent.Children = null;
                Clear();
            }
        }
        else
        { // only 1 item in root node !
            Slots[0] = null; // just nullIFY the slot.
            count = 0;
            ParentBTree.SetCurrentItemAddress(null, 0);
            // Point the current item pointer to end of tree
        }
    }
}

Search 方法

Search 方法包含与上面列出的 Add 方法的第一部分相同的逻辑。它从根节点向下遍历到子节点，直到使用迭代方法（while 循环）找到目标节点和项。但是，Search 不是添加项，而是在找到项时，仅将当前项指针定位到节点中找到的节点和项。

拥有一个项指针，结合 Search 和 Movexxx 方法，是一个强大的功能。例如，在此方法组合下进行范围查询非常容易。将项指针定位到您想要查询的（开始）项，然后使用 MoveNext 或 MovePrevious 方法顺序遍历附近的项，直到您读取完范围内所有感兴趣的项。

Using the Code

下载源代码 zip 文件，将其解压缩到您想要的目录，然后在 VS 2008 中打开B-Tree.sln 解决方案。浏览 SampleUsage 项目的源代码，了解如何使用 BTreeSortedDictionary，运行它，尝试使用其他可用方法，并重用您使用泛型 SortedDictionary 的经验。祝您使用愉快！

比较结果

1 Million Inserts on .Net SortedDictionary: 12 sec 308 MB peak 286 MB end
1 Million Inserts on BTreeDictionary:       9 sec 300 MB peak 277 MB end
1 Million Search on .Net SortedDictionary:  7 sec 309 MB peak 286 MB end
1 Million Search on BTreeDictionary:        7 sec 309 MB peak 277 MB end

结论：此 B-Tree 排序字典在内存利用率和操作速度方面均优于 .NET Framework 的 SortedDictionary。在插入 100 万个项时，此 BTree 的性能比 .NET 的高 3 秒，并且使用了少 ~9MB 的内存。

在搜索方面，两者速度相同，但在内存利用率方面，BTree 字典的内存使用量比 .NET 的 SortedDictionary 少 9MB，因此性能更优。将负载增加 10 倍并在生产环境中使用两者，这是真正的考验，我预计您会对这个 BTree 字典非常满意，这得益于我在本文顶部讨论的架构原因。

这是使用的压力测试程序

static void Main(string[] args)
{
    SortedDictionary<string, string> l1 = 
              new SortedDictionary<string, string>();
    StressAdd(l1, 1000000);
    StressSearch(l1, 1000000);
    //** uncomment to run the BTreeDictionary test
    //   and comment out the above block to turn off SortedDictionary test
    //BTreeDictionary<string, string> l2 = 
    //            new BTreeDictionary<string, string>();
    //StressAdd(l2, 1000000);
    //StressSearch(l2, 1000000);
    Console.ReadLine();
    return;
}

static void StressAdd(IDictionary<string, string> l, int Max)
{
    Console.WriteLine("Add: {0}", DateTime.Now);
    for (int i = 0; i < Max; i++)
    {
        l.Add(string.Format("{0} key kds vidsbnvibsnv siuv " + 
              "siuvsiubvisubviusvifsudjcnjdnc", i),
              "value kds vidsbnvibsnv siuv " + 
              "siuvsiubvisubviusvifsudjcnjdnccscs");
    }
    Console.WriteLine("Add: {0}", DateTime.Now);
}

static void StressSearch(IDictionary<string, string> l, int Max)
{
    Console.WriteLine("Search: {0}", DateTime.Now);
    for (int i = 0; i < Max; i++)
    {
        string v = l[string.Format("{0} key kds vidsbnvibsnv siuv " + 
                     "siuvsiubvisubviusvifsudjcnjdnc", i)];
    }
    Console.WriteLine("Search: {0}", DateTime.Now);
}

关注点

我从这次 B-Tree 实现的编写中学到了很多，也享受其中，我早在 90 年代中期就完成了 C++ 版本；事实上，我非常喜欢它，以至于它促使我继续研究磁盘版本，该版本现在已作为开源软件在 GitHub 上提供（https://github.com/SharedCode/sop）。请随时加入、参与并使用新的“磁盘”版本。

你好，“sop”项目已迁移到 GitHub：https://github.com/SharedCode/sop
并且，当前版本 V2 实现了所有宣传的“磁盘”功能，并且是用 Golang 实现的。B-Tree 也得到了优化，在“删除”操作中，节点负载均衡被关闭。它变成了一个非常酷的、支持 ACID 事务的对象持久化启用程序/适配器，运行在 Cassandra & Redis 之上。快来看看吧，加入这个项目，它等待着您的“创造性才能”！

历史

2010 年 7 月 23 日 - 初始提交
2010 年 7 月 24 日 - 添加了与 .NET Framework 的 SortedDictionary 的性能比较
2012 年 8 月 14 日 - 修改了关注点部分，仅提及开源 B-Tree Gold (v4.5) 产品
2024 年 1 月 25 日 - 添加了 Golang 端口的链接：https://github.com/SharedCode/sop