读取图像标头以获取宽度和高度

andywilsonuk

4.97/5 (15投票s)

2009年4月28日

CPOL

3分钟阅读

130242

2606

探讨快速获取图像宽度和高度的技术

下载源代码 - 9.88 KB

引言

我需要缓存一组文件夹中 JPEG 文件的方向。最简单的方法是加载每个图像，并根据其宽度和高度来判断它是横向还是纵向，可能像这样：

public bool IsLandscape(string path)
{
  using (Bitmap b = new Bitmap(path))
  {
    return b.Width > b.Height;
  }
}

当只有少量图像时，这很好，但当框架必须将图像加载到 GDI 中，然后将其传递给 .NET 时，速度会非常慢。

改进一 - 多线程

可以通过创建一个图像路径队列，然后在多个线程上加载它们来提高性能，可能使用绑定到机器上物理核心数量的ThreadPool。

foreach (string path in paths)
{
  // add each path as a task in the thread pool
  ThreadPool.QueueUserWorkItem(new WaitCallback(ThreadCallback), path);
}

// wait until all images have been loaded
lock (this.Landscapes)
{
  Monitor.Wait(this.Landscapes);
}

private void ThreadCallback(object stateInfo)
{
  string path = (string)stateInfo;
  bool isLandscape = this.IsLandscape(path);
           
  lock (this.Landscapes)
  {
    if (isLandscape)
    {
      this.Landscapes.Add(path);
    }

    imagesRemaining--;

    if (imagesRemaining == 0)
    {
      // all images loaded, signal the main thread
      Monitor.Pulse(this.Landscapes);
    }
  }
}

好的，性能得到了提高，但出现了新的问题：

主要的性能瓶颈是 IO 绑定，因为从磁盘加载图像并将其转换为可用的位图，比将其加载到内存后获取其大小花费的时间要长得多。
位图需要大量的内存，根据像素总数，我们可能要谈论超过 10 MB 的内存。
大多数计算机只有几个核心，因此线程的优势有限。

改进二 – 读取标头

我注意到许多应用程序能够非常快速地读取宽度和高度信息，快到无法读取整个文件；结果发现图像文件中存在包含宽度和高度的标头 – 这就对了。

经过一番搜索，我找到了这个隐藏的论坛帖子，其中给出了一个很好的例子，说明如何读取 JPEG 标头，以及 GIF、PNG 和 BMP 标头。

这篇文章很棒，但我发现它由于某种原因无法读取所有 JPEG 文件标头。首先，我修改了DecodeJfif，使块长度可以是无符号的 16 位整数 (C# 中的 ushort)

private static Size DecodeJfif(BinaryReader binaryReader)
{
  while (binaryReader.ReadByte() == 0xff)
  {
    byte marker = binaryReader.ReadByte();
    short chunkLength = ReadLittleEndianInt16(binaryReader);
    if (marker == 0xc0)
    {
      binaryReader.ReadByte();
      int height = ReadLittleEndianInt16(binaryReader);
      int width = ReadLittleEndianInt16(binaryReader);
      return new Size(width, height);
    }

    if (chunkLength < 0)
    {
      ushort uchunkLength = (ushort)chunkLength;
      binaryReader.ReadBytes(uchunkLength - 2);
    }
    else
    {
      binaryReader.ReadBytes(chunkLength - 2);
    }
  }

  throw new ArgumentException(errorMessage);
}

其次，我在获取尺寸周围添加了一个 try/catch 块，这样如果标头不存在，它就会退回到缓慢的方式。

public static Size GetDimensions(string path)
{
  try
  {
    using (BinaryReader binaryReader = new BinaryReader(File.OpenRead(path)))
    {
      try
      {
        return GetDimensions(binaryReader);
      }
      catch (ArgumentException e)
      {
        string newMessage = string.Format("{0} file: '{1}' ", errorMessage, path);

        throw new ArgumentException(newMessage, "path", e);
      }
    }
  }
  catch (ArgumentException)
  {
    //do it the old fashioned way

    using (Bitmap b = new Bitmap(path))
    {
      return b.Size;
    }              
  }
}

仅读取标头就产生了巨大的性能改进，因此我删除了多线程，只使用一个线程按顺序处理每个图像。

整合

为了进一步提高性能，我创建了一个包含宽度、高度和日期修改信息的 XML 缓存文件，这样只有已更改的图像才会检查其标头。我不想每次缓存图像时都保存 XML 文件，因为这会成为一个新的瓶颈。因此，我添加了一个计时器，在调用 save 方法 5 秒后将数据保存到 XML。我使用 Linq-To-XML 将 ImageFileAttributes 列表保存到磁盘

class ImageListToXml
{
  private const string XmlRoot = "Cache";
  private const string XmlImagePath = "ImagePath";
  private const string XmlWidth = "Width";
  private const string XmlHeight = "Height";
  private const string XmlImageCached = "ImageCached";
  private const string XmlLastModified = "LastModified";

  public static void LoadFromXml(string filePath, ImageList list)
  {
    list.Clear();
    XDocument xdoc = XDocument.Load(filePath);
    list.AddRange(
      from d in xdoc.Root.Elements()
      select new ImageFileAttributes(
        (string)d.Attribute(XmlImagePath),
        new Size(
          (int)d.Attribute(XmlWidth),
          (int)d.Attribute(XmlHeight)),
        (DateTime)d.Attribute(XmlLastModified)));
  }

  public static void SaveAsXml(string filePath, ImageList list)
  {
    XElement xml = new XElement(XmlRoot,
      from d in list
      select new XElement(XmlRoot,
        new XAttribute(XmlImagePath, d.Path),
        new XAttribute(XmlWidth, d.Size.Width),
        new XAttribute(XmlHeight, d.Size.Height),
        new XAttribute(XmlLastModified, d.LastModified ?? DateTime.MinValue)));

    xml.Save(filePath);
  }
}

结果

使用标头可以带来巨大的性能改进，而缓存图像的尺寸结果可以将处理时间从几秒减少到几毫秒。

Performance improvements for 563 images, from 198257ms to 69ms.

此控制台输出指示了可以获得的数量级；我们谈论的是将总时间从 3 分钟以上提高到不到十分之一秒。

历史

版本 1.0 - 初始发布