LargeListDictionary 实现

Michael Micco

4.88/5 (10投票s)

2004年3月5日

MIT

6分钟阅读

84166

1536

具有类似 HashTable 性能的、可通过键访问的列表的实现。

Sample Image - LargeListDictionary_testapp.jpg

引言

本文介绍了 LargeListDictionary 类的实现，该类像 ListDictionary 一样保持项目的顺序，但随着列表变大，其性能更接近 Hashtable。

背景

我不知道从何说起。我非常喜欢 Jason Smith 的文章：《为 .NET 添加“Set”集合支持》（Iesi.Collections 命名空间）。我正在寻找集合的实现，他的实现非常通用且优雅。不幸的是，我需要使用他的 Iesi.Collections.ListSet 类来保留我的 Iesi.Collections.Set 中项目的顺序，但当集合变大时，性能变得难以忍受，因为它底层使用了 ListDictionary，这对于从 Iesi.Collections.Set 中添加和删除项目来说性能非常糟糕。（这里的一个关键点是，集合在添加和删除项目时使用键查找，因此键查找是必需的。）

然后我开始寻找实现 IDictionary 的有序列表，以便可以在 Iesi.Collections.DictionarySet 中使用（例如：Iesi.Collections.ListSet 继承 Iesi.Collections.DictionarySet 并使用 ListDictionary）。我需要一个性能优于 .NET 的 ListDictionary 的类。我在 Code Project 上找到了以下两种实现

Hashlist - Hashtable 遇上 ArrayList，作者 Mike McPhail。
KeyedList 实现，作者 Marc Clifton。

这两种实现基本上都内部使用了 Hashtable 和 ArrayList，Hashtable 用于查找性能，ArrayList 用于保持项目的顺序。然而，它们有两个缺点。由于 ArrayList，当列表变大时，从列表中删除项目非常昂贵；并且 Iesi.Collections.Set.Intersect 方法调用了 Remove 方法。第二个缺点是枚举它们会给出散列顺序，而不是你添加项目的顺序。

总结一下，我创建了 LargeListDictionary，它在通过键添加和删除项目时，其性能将与 Hashtable 相似。然后我创建了 LargeListSet，以测试我的类在集合操作中的性能。

如果本文的任何部分让你感到困惑，请阅读上面提到的三篇文章，然后返回完成本文。

使用代码

LargeListDictionary.cs 包含两个类和一个 struct

LargeListDictionary - 一个实现 IDictionary、ICollection 和 IEnumerable 的类
LargeListDictionaryEnumerator - 一个实现 IDictionaryEnumerator 和 IEnumerator 的 struct
IndexedObject - 一个包含键、值和索引的简单类。

LargeListSet.cs 只包含一个类：LargeListSet 继承 Iesi.Collections.DictionarySet。

对于 LargeListDictionary，关键概念是它使用了两个 Hashtable 成员。两者都存储 IndexedObject 对象，但一个通过原始键存储，另一个通过内部生成的 int 索引存储。

public class LargeListDictionary : IDictionary, ICollection, IEnumerable
{
    private int _maxIndex;
    private Hashtable _itemsByKey;
    private Hashtable _itemsByIndex;

每当添加一个项目时，都会创建一个 IndexedObject，其中包含给定的键和值，并为其生成下一个索引。_maxIndex 跟踪到目前为止给出的最高索引。

public void Add(object key, object value)
{
    _maxIndex++;
    int newIndex = _maxIndex;

    IndexedObject io = new IndexedObject(newIndex, key, value);
    _itemsByKey.Add(io.Key, io);
    _itemsByIndex.Add(io.Index, io);
}

在这里，您可以看到 Add 的开销是创建 IndexedObject 并将其添加到两个 Hashtable 成员的开销。我愿意接受这额外的开销，因为真正的益处在于我检查或删除对象时。

public bool Contains(object key)
{
    return _itemsByKey.ContainsKey(key);
}

public void Remove(object key)
{
    if (_itemsByKey.ContainsKey(key))
    {
        IndexedObject io = (IndexedObject)_itemsByKey[key];
        _itemsByKey.Remove(key);
        _itemsByIndex.Remove(io.Index);
    }
}

在这里，您可以看到通过键检查项目可以获得直接的 HashTable 性能，而删除项目则可以获得基于从两个 HashTable 中删除的性能。当列表变大时，这两种操作都比 ListDictionary 有了巨大的改进。

我为什么不能只使用一个 Hashtable 呢？使用包装对象和两个 HashTable 似乎开销过大。这是为什么？

回想上面的要求，列表必须按照项目添加的顺序进行枚举。如果我添加 100、90、80、70 等，那么枚举它们应该返回 100、90、80、70。如果有一件事我们都会在某个时候粗鲁地学到，那就是 HashTable 并非如此。这就是我使用第二个以内部索引为键的 Hashtable 的原因。

秘密武器：LargeListDictionaryEnumerator

这才是有趣的地方。起初，我尝试用一个 HashTable 和一个 ArrayList 来实现一些东西，在删除项目后我会调整索引——糟糕的计划。我不会再详细说明这一点。使用两个 HashTable 实现，我可以轻松删除，但仍然存在一个问题。比方说，我将以下对象添加到 LargeListDictionary 中

键	值
555-1212	J. Smith
555-4567	M. Rose
555-3333	B. Jones

这些项目将用 IndexedObject 包装，并且 HashTable 将包含以下内容

键	目录	值
555-1212	0	J. Smith
555-4567	1	M. Rose
555-3333	2	B. Jones

查看结果，很容易看出我如何枚举。只需使用计数器并从以索引为键的 HashTable 中查找 IndexedObject。等等！一旦有人开始操作列表，它就不那么简单了。删除一个项目并添加一个新项目后，HashTable 将包含以下内容

键	目录	值
555-1212	0	J. Smith
555-3333	2	B. Jones
555-2727	3	新成员

在上面的例子中，我从列表中删除了 M. Rose 并添加了 New Guy。因为 LargeListDictionary 只增加其索引生成器，所以我在 M. Rose 曾经存在的地方留下了一个空隙。这引出了 LargeListDictionaryEnumerator 的实现。

public struct LargeListDictionaryEnumerator : 
                              IDictionaryEnumerator, IEnumerator
{

    private int _currentIndex;
    private Hashtable _itemsByIndex;
    private int _maxIndex;

    public LargeListDictionaryEnumerator(Hashtable itemsByIndex, int maxIndex)
    {
        _itemsByIndex = itemsByIndex;
        _currentIndex = -1;
        _maxIndex = maxIndex;
    }

    /// <summary>
    /// Increments the indexer. For the LargeListDictionary,
    /// if items have been removed, the next index (+1) may not
    /// have an entry.  Continue to increment until
    /// an item is found, or the max is reached.
    /// </summary>
    /// <returns>true if another item has been reached,
    /// false if the end of the list is hit.</returns>
    public bool MoveNext()
    {
        while (_currentIndex <= _maxIndex)
        {
            _currentIndex++;
            if (_itemsByIndex.ContainsKey(_currentIndex))
                return true;
        }

        return false;
    }

}

如您所见，秘密在于枚举器的 MoveNext 方法。在构造函数中，我传入了以我们给定索引为键的 HashTable 以及 LargeListDictionary 到目前为止分配的最大索引。当枚举器移动到下一个时，我们通过一个 while 循环来获取存在的下一个项目。瞧！枚举返回按添加顺序排列的项目。

其他性能考量

如果您对 LargeListDictionary 执行许多删除和添加操作，索引中会出现更多的“空隙”。这会减慢枚举速度。为了解决这个问题，您可以轻松添加一个“重新索引”方法来消除这些空隙。

如果您想实现 IList，可能还需要这种重新索引或某种其他类型的计数器。像 this[int index] 或 RemoveAt(int index) 这样的真实索引（非内部索引）访问不是我的要求之一。一旦内部索引中出现空隙，item[2] 就不能保证是 IndexedObject.Index = 2。

额外福利：LargeListSet

LargeListSet 类只是一个额外福利。它是一个继承 Iesi.Collections.DictionarySet 并使用 LargeListDictionary 作为底层 IDictionary 的类。

测试应用

提供的测试应用程序只是我为了比较基于实现 IDictionary 的不同底层类的集合的相对性能而编写的一个小型 GUI。它还有一个测试按钮，用于验证有序枚举。它包括 Jason Smith 关于集合的文章的源代码以及 Mike McPhail 关于 HashList 的文章的源代码。

结论

总而言之，我希望在需要大型字典但保留项目添加顺序很重要的情况下，其他人能发现 LargeListDictionary 的用途。

历史

2004 年 3 月 5 日：初版文章创建并提交。