优化从数据库构建树

NielsHoldijk

4.50/5 (12投票s)

2005年1月20日

4分钟阅读

93654

665

不同的看待问题的方式可以带来更好的性能。

下载源代码 - 24.9 Kb

引言

不久前，我需要编写一些代码，从 DataView 创建一个对象树。这是一项相当直接的任务，代码可以正常工作，我暂时没有再纠结于它。

最近，我开始使用 ANTS Profiler，发现树的构建实际上相当慢。我重新检查了我之前的代码，并找到了一个显著加快它的方法……

本文将解释如何利用 DataView 的递归特性来获得巨大的性能提升。

注意：我提出的方法不依赖于 DataView 的增量排序。反向排序甚至混乱的顺序都可以。如果树数据来自外部、不可控的源，这是一个非常受欢迎的特性。

ZIP 文件包含什么？

ZIP 文件包含树构建库代码和测试应用程序代码。还包括 XML 格式的树 DataSet。该树包含 745 个节点，深度约五级。该测试应用程序是一个简单的控制台应用程序，它使用慢速和快速两种方式创建 TreeNode，并显示加载时间（秒）。它这样做五次，以便更好地了解性能提升。

从树的角度看

在我原来的代码中，我从树的角度处理问题。树总有一个根节点，所以我从根节点开始，在 DataView 中查找它的子节点。对于每个子节点，我也查找它们的各自的子节点，依此类推。

DataSet 由一组记录组成，包含三列：NodeID、ParentNodeID 和 NodeText。数据是递归的：ParentNodeID 指向父节点的 NodeID。数据库确保一个节点不能附加到一个不存在的父节点上。由于这个规则，根节点是通过让节点指向自身来创建的。

  public class TreeNode
  {
    // Declarations
    private int id;               // The ID of the node
    private TreeNode parentNode;  // The parent node of the node
    private string text;          // The payload of the node
    private ArrayList childNodes; // Contains the childnodes of the node

    // Constructor
    public TreeNode(int id, TreeNode parentNode)
    {
      // Initialize the FastTreeNode
      this.id = id;
      this.parentNode = parentNode;
      this.text = "";
      childNodes = new ArrayList();

      // Check if a parent was supplied
      if (parentNode != null)
      {
        //Yes, then let the parentnode know it has a new child
        parentNode.ChildNodes.Add(this);
      }
    }

    // Properties
    public int ID
    {
      get {return id;}
    }

    public TreeNode ParentNode
    {
      get {return parentNode;}
      set
      {
        parentNode = value;
        // Let the parent node know it has a new child,
        // if possible, and if necessary
        if (parentNode != null && !parentNode.ChildNodes.Contains(this))
          parentNode.ChildNodes.Add(this);
      }
    }

    public string Text
    {
      get {return text;}
      set {text = value;}
    }

    public ArrayList ChildNodes
    {
      get {return childNodes;}
    }
  }

TreeNode 类基本上反映了数据库结构。它有一个 ID，并且引用一个父节点。根节点有一个 null 父节点。作为简单的有效负载，节点有一个 Text 属性。

public TreeNode LoadTree(DataView dataTree)
    {
      // Prepare the resulting rootnode
      TreeNode rootNode = null;

      // Loop through the dataview
      foreach (DataRowView dataNode in dataTree)
      {
        // Store the current record in typed variables
        int nodeID = (int)dataNode["NodeID"];
        int parentNodeID = (int)dataNode["ParentNodeID"];
        string text = dataNode["NodeText"].ToString();

        // Check if the current node is the rootnode
        if (nodeID == parentNodeID)
        {
          // Yes, so create the node, and start
          // searching for it's child nodes
          rootNode = new TreeNode(nodeID, null);
          rootNode.Text = text;
          LoadChildNodes(rootNode, dataTree);
        }
      }

      // Return the result
      return rootNode;
    }

    private void LoadChildNodes(TreeNode parentNode, DataView dataTree)
    {
      // Loop throught the dataview
      foreach (DataRowView dataNode in dataTree)
      {
        // Store the current record in typed variables
        int nodeID = (int)dataNode["NodeID"];
        int parentNodeID = (int)dataNode["ParentNodeID"];
        string text = dataNode["NodeText"].ToString();

        // Check if the current node is not the root node,
        // and if the current node
        // is a child node of the supplied parent node
        if (nodeID != parentNodeID && parentNodeID == parentNode.ID)
        {
          // Yes, create the node
          // and start searching for it's child nodes.
          TreeNode node = new TreeNode(nodeID, parentNode);
          node.Text = text;
          LoadChildNodes (node, dataTree);
        }
      }
    }

LoadTree 方法调用递归的 LoadChildNodes 方法。这使得代码相当紧凑，但是，它有一个大问题：对于每个节点，它都会扫描整个 DataView，包括已处理的节点。所以，对于测试 DataSet，将读取 555770 条记录。加上 DataView 读取速度不快的事实，放置所有节点所需的总时间是相当可观的。

此外，如果您尝试加载一个非常深的树，可能会导致堆栈溢出问题。

从 DataView 的角度看

DataView 具有线性特征：它从记录 0 开始，到记录 x 结束。起初，似乎不可能通过简单地从 0 到 x 遍历记录来创建每个节点：如果它的父节点没有被提前创建，那么就不可能创建当前节点。但事实并非如此，是吗？

诀窍在于放弃“每个创建的对象都应该是完整的”这种想法。我不知道父节点的信息，但我知道它将有的 ID。所以，我可以创建一个只有父节点 ID、没有父节点的新节点，并将其存储在 Hashtable 中。如果我稍后在 DataSet 中找到空节点的 ID，我不会创建它，而是从 Hashtable 中获取存储的节点，然后填充空属性。

public TreeNode LoadTree(DataView dataTree)
    {
      // Loop through all records in the dataview
      foreach (DataRowView dataNode in dataTree)
      {
        // Store the current record in some typed variables
        int nodeID = (int)dataNode["NodeID"];
        int parentNodeID = (int)dataNode["ParentNodeID"];
        string text = dataNode["NodeText"].ToString();

        // Check if the node was already created
        if (nodeList.Contains(nodeID))
        {
          // Yes, fill in the missing properties
          TreeNode node = (TreeNode)nodeList[nodeID];
          node.Text = text;
          // If the node is not a root node,
          // and there's no parent yet, look up or create
          if (nodeID != parentNodeID && node.ParentNode == null)
          {
            TreeNode parentNode = null;
            // Check if the parentnode was already created
            if (nodeList.Contains(parentNodeID))
            {
              // Yes, so use that one
              parentNode = (TreeNode)nodeList[parentNodeID];
            }
            else
            {
              // The parentnode doesn't exist yet, so create a partial parentnode.
              parentNode = new TreeNode(parentNodeID, null);
              nodeList.Add(parentNodeID, parentNode);
            }
            node.ParentNode = parentNode;
          }
        }
        else
        {
          // New node, check if it's a root node
          if (nodeID == parentNodeID)
          {
            // Yes, so no need for looking up or creating a parentnode
            TreeNode node = new TreeNode(nodeID, null);
            node.Text = text;
            nodeList.Add(nodeID, node);
          }
          else
          {
            // New child node
            TreeNode parentNode = null;
            // Check if the parentnode was already created
            if (nodeList.Contains(parentNodeID))
            {
              // Yes, so use that one
              parentNode = (TreeNode)nodeList[parentNodeID];
            }
            else
            {
              // No, so create a partial parentnode.
              parentNode = new TreeNode(parentNodeID, null);
              nodeList.Add(parentNodeID, parentNode);
            }

            // Create the new node
            TreeNode node = new TreeNode(nodeID, parentNode);
            node.Text = text;
            nodeList.Add(nodeID, node);
          }
        }
      } 
     
      // Find the rootnode
      IDictionaryEnumerator nodeEnumerator = nodeList.GetEnumerator();
      while (nodeEnumerator.MoveNext())
      {
        TreeNode node = (TreeNode)nodeEnumerator.Value;
        if (node.ParentNode == null)
          return node;
      }

      // Return nothing if no rootnode was found
      return null;
    }

这里的 LoadTree 方法稍长一些，但它每条记录只读取一次。不是在 DataView 中搜索节点，而是通过速度快得多的 Hashtable 进行搜索。最后需要做的唯一额外工作是定位根节点，而 SlowTree 版本已经准备好了：它是起点。然而，由于树构建的速度非常快，这段额外代码所需的时间实际上并不显著。

结果

在我 1.8GHz 的笔记本电脑上，测试应用程序使用慢速方法获取树只需要略高于两秒钟。使用快速方法，创建相同的树仅需约 0.004 秒。这快了约 500 倍！

注意：当然，测试树是一个非常简单的树。在实际应用程序中，每个树节点可能比文本有更大的有效负载。这可能会稍微减慢速度，但仍然比原来的慢树快得多。我在我的应用程序中看到过提高到 400 倍的速度。

结论

应用程序可能性能不佳，因为

它非常频繁地调用（快速）方法，或者
它调用非常慢的方法，或者，最糟糕的是，
它非常频繁地调用非常慢的方法。

本文提供的解决方案通过将对最慢方法的调用（从 DataView 读取记录）降至绝对最低，从而避免了这一点。诚然，这仅得益于子对象和父对象之间的紧密关系，以及对我们拥有的少量对象信息的巧妙运用。

致谢

我想感谢

Daniel Strigl，感谢他易于使用的 HighPerformanceTimer 类[^]，我用它来进行测量。
我的同事 Jan，他最初提出了所描述的这个想法。