集合设计 / 图形架构师中级开发 .NET C#

C# 中的正确列表支持

honey the codewitch

5.00/5 (23投票s)

2019 年 11 月 17 日

MIT

6分钟阅读

44342

370

在 .NET 中为自定义数据结构实现完整的列表支持

下载源文件 - 7.1 KB

screen shot

引言

本文是关于编码集合和列表类的。它将引导您完成在自定义数据结构上实现 IEnumerable<T>、IEnumerator<T>、ICollection<T> 和 IList<T> 的具体细节。

撰写本文的原因是该主题的覆盖不够全面，并且在使用这些接口时存在一些微软未明确记录的陷阱。因此，我们将把此类类的开发分解为可管理的步骤。

背景

.NET 中的列表类提供 foreach 枚举、索引寻址以及对任意可变数量项的通用存储。它们的工作方式与 .NET 中的 array 非常相似，只是它们本身是可调整大小的。在 .NET 中，arrays 本身实现了 IList<T>。

在 .NET 中使用列表的主要类是 List<T>。它在内部数组之上提供了一个功能齐全的列表，该数组会根据需要进行调整大小。

这通常是高效且合适的，但在某些情况下并非如此。微软在其 LinkedList<T> 类中提供了一个替代方案，该类将值存储为链表而不是数组。

我们将实现自己的链表。虽然使用微软高度优化的类是首选，但此类将很好地为我们提供说明。在此过程中，我们将涵盖实现列表的所有主要组成部分和陷阱，以便您的实际列表能够有效且高效地工作。

编写这个混乱的程序

为了保持清晰，我已将主要接口分解为几个部分类。

让我们从 LinkedList.cs 中的数据结构本身开始。

// the basic LinkedList<T> core
public partial class LinkedList<T>
{
    // a node class holds the actual entries.
    private class _Node
    {
        // holds the value of the current entry
        public T Value = default(T);
        // holds the next instance in the list
        public _Node Next = null;
    }
    // we must always point to the root, or null
    // if empty
    _Node _root=null;
}

它本身根本不是一个列表！它不实现任何相关接口，但它确实包含一个嵌套类，其中包含我们的节点结构。该节点是链表中的单个节点。它将其字段保存为类型 T 的值和下一个节点。外部类还包含一个字段，该字段保存列表中的第一个节点，如果列表为空，则为 null。

我喜欢通过实现 IEnumerable<T>/IEnumerator<T> 来开始编码列表接口，这可以实现 foreach，因为它是如此基本的操作，并且（通常）除了数据结构本身之外没有其他代码依赖项。让我们来看看 LinkedList.Enumerable.cs

partial class LinkedList<T> : IEnumerable<T>
{
    // versioning is used so that if the collection is changed
    // during a foreach/enumeration, the enumerator will know to
    // throw an exception. Every time we add, remove, insert,
    // set, or clear, we increment the version. This is used in 
    // turn by the enumeration class, which checks it before 
    // performing any significant operation.
    int _version = 0;

    // all this does is return a new instance of our Enumeration
    // struct
    public IEnumerator<T> GetEnumerator()
    {
        return new Enumerator(this);
    }
    // legacy collection support (required)
    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        => GetEnumerator();

    // this is the meat of our enumeration capabilities
    struct Enumerator : IEnumerator<T>
    {
        // we need a link to our outer class for finding the
        // root node and for versioning.
        // see the _version field comments in LinkedList<T>
        LinkedList<T> _outer;
        // hold the value of _outer._version we got when the
        // enumerator was created. This is compared to the 
        // _outer._version field to see if it has changed.
        int _version;
        // this is our state so we know where we are while
        // enumerating. -1 = Initial, -2 = Past end,
        // -3 = Disposed, and 0 means enumerating
        int _state;
        // the current node we're on.
        _Node _current;
        
        public Enumerator(LinkedList<T> outer)
        {
            _outer = outer;
            _version = outer._version;
            _current = null;
            _state = -1;
        }
        // reset the enumeration to the initial state
        // if not disposed.
        public void Reset()
        {
            _CheckDisposed();
            _current = null;
            _state = -1;
        }
        // just set the state to inform the class
        void IDisposable.Dispose()
        {
            _state = -3;
        }
        // performs the meat of our _Node
        // traversal
        public bool MoveNext()
        {
            // can't enum if disposed
            _CheckDisposed();
            switch (_state)
            {
                case -1: // initial
                    _CheckVersion();
                    // just set the current to the root
                    _current = _outer._root;
                    if (null == _current)
                    {
                        // our enumeration is empty
                        _state = -2;
                        return false;
                    }
                    // we're enumerating
                    _state = 0;
                    break;
                case -2: // past the end
                    return false;
                case 0: // enumerating
                    _CheckVersion();
                    // just move to the next
                    // _Node instance
                    // if it's null, stop enumerating
                    _current = _current.Next;
                    if (null == _current)
                    {
                        _state = -2;
                        return false;
                    }
                    break;
            }
            return true;
        }
        public T Current {
            get {
                switch (_state)
                {
                    // throw the appropriate
                    // error if necessary
                    case -1:
                        throw new InvalidOperationException
                              ("The enumeration is before the start position.");
                    case -2:
                        throw new InvalidOperationException
                              ("The enumeration is past the end position.");
                    case -3:
                        // always throws
                        _CheckDisposed(); 
                        break;
                }
                _CheckVersion();
                return _current.Value;
            }
        }
        // legacy support (required)
        object System.Collections.IEnumerator.Current => Current;
        // throws if disposed
        void _CheckDisposed()
        {
            if (-3 == _state)
                throw new ObjectDisposedException(GetType().FullName);
        }
        // throws if versions don't match
        void _CheckVersion()
        {
            if (_version != _outer._version)
                throw new InvalidOperationException("The enumeration has changed.");
        }
    }
}

这可能有点难理解，您可能想知道为什么我们不直接使用 C# 迭代器通过 yield 语法，但有一个很好的理由，那就是版本控制和重置。

为了提供对集合的 IEnumerator<T> 的正确实现，必须跟踪集合以确保在枚举项时集合没有更改。这是使用 _version 字段完成的。每次更改集合时，版本都会递增。将此字段与枚举器的副本进行比较，以查看它们是否相同。如果不相同，枚举器将抛出异常。C# 迭代器不提供此功能，但对于列表的完整、正确和健壮实现来说，它是必需的。

此外，迭代器不支持 Reset()。虽然 foreach 不使用它，但它可以被其他使用者使用，并且通常期望它能正常工作。

如果您真的不想实现所有这些，可以使用 C# 迭代器支持，但请记住，这不是列表枚举器的规范或完整实现。

注意 _state 字段的使用。其主要目的是进行错误处理，以便我们可以区分位于开头、结尾还是已处理状态。还要注意，我们在几乎所有操作中都检查版本并检查是否已处理。这对于使枚举器健壮很重要。

与所有枚举器实现一样，MoveNext() 是枚举器的核心，其工作是将光标前进一位。如果还有更多要读取的项，则返回 true，否则返回 false 表示没有更多项。在此例程中，我们只是遍历链表，并适当地更新状态。同时，Current、Reset() 和 Dispose() 都很直接。

接下来，我们有 ICollection<T>，它本质上为我们的类中存储和检索项提供了一个基本接口。

// linked list collection interface implementation
partial class LinkedList<T> : ICollection<T>
{
    // we keep a cached _count field so that
    // we don't need to enumerate the entire 
    // list to find the count. it is altered
    // whenever items are added, removed, or
    // inserted.
    int _count=0;
    public int Count {
        get {
            return _count;
        }
    }
    // returns true if one of the nodes has 
    // the specified value
    public bool Contains(T value)
    {
        // start at the root
        var current = _root;
        while(null!=current)
        {
            // enumerate and check each value
            if (Equals(value, current.Value))
                return true;
            current = current.Next;
        }
        return false;
    }
    public void CopyTo(T[] array,int index)
    {
        var i = _count;
        // check our parameters for validity
        if (null == array)
            throw new ArgumentNullException(nameof(array));
        if (1 != array.Rank || 0 != array.GetLowerBound(0))
            throw new ArgumentException("The array is not an SZArray", nameof(array));
        if (0 > index)
            throw new ArgumentOutOfRangeException(nameof(index), 
                  "The index cannot be less than zero.");
        if (array.Length<=index)
            throw new ArgumentOutOfRangeException(nameof(index), 
                  "The index cannot be greater than the length of the array.");
        if (i > array.Length + index)
            throw new ArgumentException
            ("The array is not big enough to hold the collection entries.", nameof(array));
        i = 0;
        var current = _root;
        while (null != current)
        {
            // enumerate the values and set
            // each array element
            array[i + index] = current.Value;
            ++i;
            current = current.Next;
        }
    }
    // required for ICollection<T> but we don't really need it
    bool ICollection<T>.IsReadOnly => false;

    // adds an item to the linked list
    public void Add(T value)
    {
        _Node prev = null;
        // start at the root
        var current = _root;
        // find the final element
        while (null != current)
        {
            prev = current;
            current = current.Next;
        }
        if (null == prev)
        {
            // is the root
            _root = new _Node();
            _root.Value = value;
        }
        else
        {
            // add to the end
            var n = new _Node();
            n.Value = value;
            prev.Next = n;
        }
        // increment count and version
        ++_count;
        ++_version;
    }

    // simply clears the list
    public void Clear()
    {
        _root = null;
        _count = 0;
        ++_version;
    }
    // removes the first item with the 
    // specified value
    public bool Remove(T value)
    {
        _Node prev = null;
        var current = _root;
        while (null != current)
        {
            // find the value.
            if(Equals(value,current.Value))
            {
                if(null!=prev) // not the root
                    // set the previous next pointer
                    // to the current's next pointer
                    // effectively eliminating 
                    // current
                    prev.Next = current.Next;
                else // set the root
                    // we just want the next value
                    // it will eliminate current
                    _root = current.Next;
                // decrement the count
                --_count;
                // increment the version
                ++_version;
                return true;
            }
            // iterate
            prev = current;
            current = current.Next;
        }
        // couldn't find the value
        return false;
    }
}

注意 _count 字段的添加。集合使用者期望 Count 属性非常快。但是，为了检索链表中项的数量，您通常必须遍历列表中的每个项才能找到它。在这种情况下，这是不可取的，因此每次我们向列表中添加和删除项时，都会相应地更新 _count 字段。这样，我们就避免了不必要的性能下降。这是实现自定义集合时的一个非常常见的模式。

其余成员要么显而易见，要么如上面的注释所示。请注意，在更新集合时，要明智地使用错误处理，并仔细记录 _count 和 _version 字段。这至关重要。这里需要注意的一点是 CopyTo() 方法，该索引指的是开始复制的数组中的索引，而不是集合中的索引。也就是说，该索引是目标索引。

现在转向 LinkedList.List.cs 中 IList<T> 的实现。

    // gets or sets the value at the specified index
    public T this[int index] {
        get {
            // check the index for validity
            if (0 > index || index >= _count)
                throw new IndexOutOfRangeException();
            // start at the root
            var current = _root;
            // enumerate up to the index
            for (var i = 0;i<index;++i)
                current = current.Next;
            // return the value at the index
            return current.Value;
        }
        set {
            // check for value index
            if (0 > index || index >= _count)
                throw new IndexOutOfRangeException();
            // start at the root
            var current = _root;
            // enumerate up to the index
            for (var i = 0; i < index; ++i)
                current = current.Next;
            // set the value at the current index
            current.Value=value;
            // increment the version
            ++_version;
        }
    }
    // returns the index of the specified value
    public int IndexOf(T value)
    {
        // track the index
        var i = 0;
        // start at the root
        var current = _root;
        while (null!=current)
        {
            // enumerate checking the value
            if (Equals(current.Value, value))
                return i; // found
            // increment the current index
            ++i;
            // iterate
            current = current.Next;
        }
        // not found
        return -1;
    }
    // inserts an item *before* the specified position
    public void Insert(int index,T value)
    {
        // check index for validity
        // the index can be the count in order to insert
        // as the last item.
        if (0 > index || index > _count)
            throw new ArgumentOutOfRangeException(nameof(index));

        if (0 == index) // insert at the beginning
        {
            _Node n = new _Node();
            n.Next = _root;
            n.Value = value;
            _root = n;
        }
        else if (_count == index) // insert at the end
        {
            Add(value);
            return;
        }
        else // insert in the middle somewhere
        {
            // start at the root
            var current = _root;
            _Node prev = null;
            // iterate up to the index
            for (var i = 0; i < index; ++i)
            {
                prev = current;
                current = current.Next;
            }
            // insert a new node at the position
            var n = new _Node();
            n.Value = value;
            n.Next = current;
            prev.Next = n;
        }
        // update the version and increment the count
        ++_version;
        ++_count;
    }
    // removes the item at the specified index
    public void RemoveAt(int index)
    {
        // check the index for validity
        if (0 > index || index >= _count)
            throw new ArgumentOutOfRangeException(nameof(index));
        // start at the root
        var current = _root;
        _Node prev = null;
        if (1 == _count) // special case for single item
            _root = null;
        else
        {
            // iterate through the nodes up to index
            for (var i = 0; i < index; ++i)
            {
                prev = current;
                current = current.Next;
            }
            // replace the previous next node with
            // current next node, effectively removing
            // current
            if (null == prev)
                _root = _root.Next;
            else
                prev.Next = current.Next;
        }
        // decrement the count
        --_count;
        // increment the version
        ++_version;
    }
}

这里的许多代码都是不言而喻的。只需注意在 this[int index].set 属性上更新 _version！ Insert() 的复杂性仅在于链表的工作方式。关于 Insert() 的唯一注意事项是，该索引是指新项之后的索引，并且可以高达 _count 而不是 _count-1。因此，插入应被概念性地理解为“在索引之前插入”。

关于 IList<T> 的一点要注意：并非总是适合实现。事实上，由于链表的工作方式，它不像数组那样真正支持直接索引。当然，我们对其进行了模拟，但这并不真正高效，因为我们必须进行迭代。因此，在链表之上实现此接口不一定是好的做法。我们在此只是出于演示目的而这样做。在实际工作中，您将选择最适合您的数据结构的集合接口。这里，ICollection<T> 可能是合适的，而无需实现 IList<T>。

最后，转向我们在 LinkedList.Constructors.cs 中找到的构造函数。

partial class LinkedList<T>
{
    // optimized constructor for adding a collection
    public LinkedList(IEnumerable<T> collection)
    {
        // check for collection validity
        if (null == collection)
            throw new ArgumentNullException(nameof(collection));
        // track the count            
        int c = 0;
        // track the current node
        _Node current = null;
        // enumerate the collection
        foreach(var item in collection)
        {
            // if this is the first item
            if(null==current)
            {
                // set the root
                _root = new _Node();
                _root.Value = item;
                current = _root;
            } else
            {
                // set the next item
                var n = new _Node();
                n.Value = item;
                current.Next = n;
                current = n;
            }
            ++c;
        }
        // set the count
        _count = c;
    }
    // default constructor
    public LinkedList() {
    }
}

在这里，我们提供了一个默认构造函数和一个从现有 collection 或 array 复制的构造函数。这些是您确实想要提供的最基本的构造函数。根据您的内部数据结构，可能还有其他合适的构造函数。例如，如果它是基于数组的，您可能会有一个 capacity，而 B 树可能有一个 order。在这种情况下，采用集合的构造函数经过了优化，但即使它所做的只是在一个循环中调用 Add()，您通常也应该提供它。

就是这样！现在您拥有了一个健壮的列表实现，它可以正确处理错误，高效，并且能够处理正确实现列表的各种陷阱。也就是说，请勿在生产环境中使用此链表。微软的实现可能更优化，而且肯定经过了更多测试。但是，这应该为实现您自己的数据结构提供一个前进的道路。

历史

2019 年 11 月 17 日 - 初次提交