带覆盖保护和按需内存分配的缓冲区

Francesc Castells

4.57/5 (7投票s)

2008年3月10日

CPOL

7分钟阅读

29151

276

实现一个大型内存缓冲区，支持按需内存分配和覆盖保护。

引言

我目前正在开发的一个项目是一个框架，它将使其他开发人员在未来几年内能够添加将结果写入一个相当大的缓冲区的算法。这些数据可以从一个位到几百个 KWord。字的大小可以从一次执行到另一次执行而变化，并且算法将结果放置在缓冲区中的任意地址。

与分配一个巨大的字节数组或动态集合不同，我们认为将缓冲区分成小的、按需分配内存的块会很有用。出于安全原因，我们添加了一个位级别的覆盖保护机制，以便尽快检测到导致算法重复写入同一位置的错误。

在本文中，我将尝试解释我的缓冲区实现。

背景

此缓冲区的设计与我们的应用程序有着密切的关系。例如，默认情况下，所有位都设置为 1。除此之外，您可能需要一个具有覆盖保护、按需分配或我们缓冲区具有的其他功能的缓冲区，因此我不会尝试解释为什么这种设计对我们有用，但我会努力清楚地说明它是如何工作的以及如何使用它。

在文章的最后，我将提供一些关于如何改进此设计的想法。

缓冲区是什么样子的？

假设我们需要一个字大小为 17 位、深度为 10MWord 的缓冲区。我们如何组织缓冲区？

我们要做的第一件事是将缓冲区分成块；我们称这些块为 Sectors（扇区），在我们的例子中，它们大小都相同，但不必如此。所以，我们的缓冲区将有一个 Sectors 集合。这些 Sectors 所需的内存将在我们第一次尝试向它们写入数据时分配（请注意，我们不需要分配内存来读取默认值）。在我们的应用程序中，任意数量的这些 Sectors 将永远不会被写入。通过这种技术，我们节省了大量内存。在我们的示例中，我们可以将 10MW 分成 160 个大小为 64KW 的 Sectors。

我们希望按位级别写入数据。我们不直接向缓冲区用户提供对 Word（字）的访问，而是提供一套在 Word 类中实现的 Write 方法。为了能够配置每个 Word 的位数，我们将每个 Word 实现为一个字节数组，而这些 Write 方法封装了对这些数组的访问（在我们的例子中，每个 Word 3 个字节）。此外，由于我们一次分配一个完整的 Sector，而不是拥有一个字节数组的数组，所以我们每个 Sector 只有一个字节数组（3 字节/Word*64KW），而 Word 类处理该数组的一个子集。

为了实现覆盖保护，我们需要跟踪所有已写入的位。我们将此信息存储在一个我称之为 BlackAndWhiteBuffer 的缓冲区中，它为数据缓冲区中的每个位都有一个位。这意味着我们将使用两倍于没有覆盖保护所需内存的内存。这本身就证明了“锁定”一个 Sector 的概念，当我们完成对它的操作后，将其设为只读，因此我们不再需要 BlackAndWhiteBuffer，从而释放其内存。

我们引入的最后一个用于操作此缓冲区概念是 Window（窗口）。缓冲区的窗口提供对缓冲区子集的访问。算法通过这些窗口访问缓冲区；这样，我们就可以应用偏移量来处理它们的输出，并限制它们对内存的访问，因为窗口不能提供超出其边界的访问。

以下图表应有助于理解这些概念。

在图表中，我们看到可以通过三种不同的视图访问缓冲区

内存视图：连续访问整个缓冲区
窗口：通过定义的窗口访问
扇区视图：通过各个扇区访问

让我们分析图表中所示的示例

写访问 1 显示了写入缓冲区如何导致整个 Sector 和相应的 BlackAndWhiteBuffer 的分配。数据在所有视图中可见。
写访问 2 显示了在没有定义窗口的情况下，我们有相同的行为。
写访问 3 显示了一个 Sector 已被分配然后被锁定（BlackAndWhiteBuffer 不存在）。随后的 Sector 在分配之前已被锁定。读取它将返回默认的 Word 值。请注意，我们不能锁定一个窗口，因为它可以与另一个活动窗口共享一个 Sector。

实现

此解决方案实现了两个程序集

Buffers.IWriteOnceBuffers：定义了我们指南所需的所有类的接口（此处不讨论），以及自定义异常。
Buffers.WriteOnceBuffers：实现了指定的功能。

下面显示了主要接口的定义

关于接口的一些说明

Buffer、Sector 和 Window 都有一个 Words 属性，该属性提供对 Word 集合的访问。如前所述，它们不是真正的集合，而是访问底层缓冲区的不同方式。
Buffer 提供了两种添加和删除窗口的方法。
所有接口都实现了 IEquatable<T>。这些特定的实现会比较缓冲区的 contents。
Sector 和 Window 具有 StartAddress 和 EndAddress 属性，但数据访问是通过偏移量（从 0 到 Depth-1）进行的。
底层工作由 WordBuffer 类完成。该类被用来代替普通的 byte[]，并控制数据宽度、按需分配、初始化为默认值以及锁定。
在下面的代码片段中，您可以看到构造函数以及设置和获取数据的操作。

public WordBuffer(int numWords, int numBytesPerWord, long defaultValue)
{
  if(numBytesPerWord < cntMinNumBytesPerWord || numBytesPerWord > cntMaxNumBytesPerWord )
    throw new ArgumentOutOfRangeException("numBytesPerWord", 
    string.Format(CultureInfo.InvariantCulture, StringResources.ArgumentOutOfRangeMessage, 
    "numBytesPerWord", numBytesPerWord, cntMinNumBytesPerWord, cntMaxNumBytesPerWord));

  this.numWords = numWords;
  this.numBytesPerWord = numBytesPerWord;

  // Default value as a byte array
  long bwValue = defaultValue;
  defaultValueInBytes = new byte[numBytesPerWord];
  for(int i=0; i<numBytesPerWord; i++)
  {
    defaultValueInBytes[i] = (byte)bwValue;
    bwValue>>=8;
  }
  // Default value casted to the number of bytes
  for(int i=numBytesPerWord-1; i>=0; i--)
    this.defaultValue = (this.defaultValue<<8) + defaultValueInBytes[i];
}

/// <summary>
/// Gets the byte[] as an integer value depending on the numBytesPerWord
/// </summary>
public long GetIntegerData(int byteOffset)
{
  if(buffer==null)
  {
    return defaultValue;
  }
  else
  {
    switch(numBytesPerWord)
    {
      case 1:
        return (long)buffer[byteOffset];
      case 2:
        return (long)BitConverter.ToUInt16(buffer, byteOffset);
      case 4:
        return (long)BitConverter.ToUInt32(buffer, byteOffset);
      default:
        long data = 0;
        for(int i=numBytesPerWord-1; i>=0; i--)
          data = (data<<8) + buffer[byteOffset+i];
        return data;
    }
  }
}

/// <summary>
/// Sets the byte[] as an integer value depending on the numBytesPerWord
/// </summary>
public void SetIntegerData(int byteOffset, long value)
{
  if(IsLocked)
    throw new BufferIsReadOnlyException();

  if(buffer==null)
    AllocateBuffer();

  Buffer.BlockCopy(BitConverter.GetBytes(value), 0, buffer, byteOffset, numBytesPerWord);
}

WriteOnceBufferWord 类负责以正确的地址、正确的格式写入数据，并控制重叠。我们不需要每个 Word 都有这个类的实例，同一个实例可以定位在特定的地址并获得对数据的访问。WriteOnceBufferWordCollection 类负责将 Word 定位在正确的偏移量上。

以下是 WriteOnceBufferWord 中修改 Word 的两个方法

/// <summary>
/// Writes only the bits specified in the mask
/// </summary>
/// <param name="value">Value to be written</param>
/// <param name="mask">Bits set to 1 will be written</param>
/// <returns>Word after writing the value</returns>
public long Write(long value, long mask)
{
  // Check overlappings
  if( (IntegerBlackAndWhite & mask) != mask )
    throw new WriteOverlappingException();
  
  // (value|~mask) sets to 1 all bits we don't want to clear
  IntegerData &= (value|~mask);
  IntegerBlackAndWhite &= ~mask;
  
  return IntegerData;
}

/// <summary>
/// Writes a value int the specified bit offset and using the specified number of bits
/// </summary>
/// <param name="bitOffset"></param>
/// <param name="lengthInBits"></param>
/// <param name="value"></param>
/// <returns></returns>
public long Write(long value, int bitOffset, int lengthInBits)
{
  if(lengthInBits==0)
    return IntegerData;
  
  long mask = 0x01;
  // Creates a mask with the proper number of bits
  for(int i=1; i<lengthInBits; i++)
  {
    mask<<=1;
    mask|=0x01;
  }

  // Places the mask and the value in the proper bitOffset
  mask<<=bitOffset;
  value<<=bitOffset;

  return Write(value, mask);
}

如您所见，这些方法访问 IntegerData 和 IntegerBlackAndWhite。它们是同一个类的两个私有属性，用于封装对缓冲区中正确位置的访问。

/// <summary>
/// Gets or sets the byte[] as an integer value
/// </summary>
private long IntegerData
{
  get 
  {
    return ownerSector.DataBuffer.GetIntegerData(byteOffset);
  }
  set 
  {
    ownerSector.DataBuffer.SetIntegerData(byteOffset, value);
  }
}

/// <summary>
/// Gets or sets the black and white byte[] as an integer value
/// </summary>
private long IntegerBlackAndWhite
{
  get
  {
    return ownerSector.BlackAndWhiteBuffer.GetIntegerData(byteOffset);
  }
  set
  {
    ownerSector.BlackAndWhiteBuffer.SetIntegerData(byteOffset, value);
  }
}

为了将数据作为整数而不是 byte 数组来访问，我将其转换为 long。这限制了每个 Word 的最大位数为 63，这足以满足我们的应用程序。

示例应用程序

您可以下载的示例程序不是真实的应用程序，而是一系列测试，用于展示如何使用该类以及某些操作对分配内存的影响。请注意，所示值包括程序使用的其他变量和数据，并不旨在精确测量每个类或成员使用的内存。

关注点

此实现经过一些迭代改进其性能，已经达到了或多或少稳定的状态，其性能仍然受到默认 Word 值不是 0 的事实的严重影响。例如，通过实现按需初始化可以进行一些改进。另一个影响性能的点是 byte[] 和 long 之间的转换。也许这些转换在某些点可以被跳过。非常欢迎关于提高性能的其他想法。

如果您查看 WriteOnceBufferWordCollection 的实现，您会发现它实现了 IList<IWriteOnceBufferWord> 以提供对集合的类型安全访问，但它也实现了 IList。这允许使用 WinForms 2.0 中的 DataGridView 等控件进行数据绑定。您可以用两行代码将相当大的缓冲区绑定到网格。

对于极其大的缓冲区，实现某种磁盘序列化将很有趣，例如，用于锁定的 Sector。现在，可以通过提供的两个静态方法轻松地对整个缓冲区进行序列化和反序列化。这种序列化保持了缓冲区的精确状态。如果一个 Sector 被锁定，它显然需要更少的磁盘空间。

版本历史

2008/10/03: 首次发布。