GDI+中级开发 C#

OCR 线条检测

mehran ghainian hasaruye

4.87/5 (21投票s)

2010年7月10日

CPOL

2分钟阅读

111515

7314

一种从图像中提取线条的简单算法。

下载源代码 - 147 KB

引言

开发 OCR 系统中的第一步之一是行检测。波斯语/阿拉伯语文本具有一些使其难以识别的特性。例如，波斯语中存在像英语中的“i”这样的字符，它有两个部分，但被识别为一个字符。我已经在下面的代码中涵盖了这个问题。

背景

读者应具备基本的 GDI 技能和图像处理基本概念的知识。

使用代码

首先，您应该考虑到该算法无法检测被线条垂直覆盖的字符行，就像下图中的那样

该算法非常简单

图像二值化
将字符行水平投影视为一条连续的垂直线
从上到下扫描图像，并从上一阶段找到每条垂直线的顶部和底部
由于像“？”这样的字符被识别为两条线，我们将那些与下一条线的距离为其高度一定比例的线合并。
将线条保存到输出目录

首先，我们应该对图像进行二值化。我使用了一个简单的阈值算法，但是像著名的 Otsu 阈值这样的算法会产生更好的图像。

public Bitmap Threshold(Bitmap bitmap, int thresholdValue)
{
     byte thrByte = (byte)(thresholdValue);
     bitmap = ApplyFilter(new Threshold(thrByte), bitmap);
     bitmap = GetIndexedPixelFormat(bitmap);
     return bitmap;
}

在第二步中，我们尝试将所有黑色像素水平投影，以提取图像的水平投影。这将产生一个不连续的黑色点集合，我们将该集合的顶部和底部视为每条线的顶部和底部

public List<Belt> ExtractBeltsBasedonCoveredHeight(Bitmap mehrImage)
{
    int y = 0;
    int x = 0;
    bool line_present = true;
    List<int> line_top = new List<int>(1000);
    List<int> line_bottom = new List<int>(1000);
    List<Belt> lines = new List<Belt>();
    while (line_present)
    {
        x = 0;
        y = FindNextLine(mehrImage, y, ref x);
        if (y == -1)
        break;
        if (y >= mehrImage.Height)
        {
            line_present = false;
        }
        if (line_present)
        {
            line_top.Add(y);
            y = FindBottomOfLine(mehrImage, y) + 1;
            line_bottom.Add(y);
        }
    }
   
    for (int line_number = 0; line_number < line_top.Count; line_number++)
    {
        int height = line_bottom[line_number] - line_top[line_number] + 1;
        Bitmap bmp = new Bitmap(mehrImage.Width, height + 2);
        FillImage(bmp, Brushes.White);
        bmp = GetSpecificAreaOfImage(
        new Rectangle(0, line_top[line_number] - 1, 
                      mehrImage.Width, height + 2), mehrImage);
        Belt belt = new Belt(bmp);
        belt.RelativeTop = line_top[line_number];
        belt.RelativeBottom = line_bottom[line_number];
        lines.Add(belt);
    }
    lines = RemoveNoisyData(lines);
    return lines;
}

为了找到线条的底部和顶部，我开发了这两个函数：FindNextLine，它从水平投影中找到下一个集合的第一个黑色像素，以及FindBottomOfLine，它查找 Y 维度大于线顶部的第一个白色像素。

public int FindBottomOfLine(Bitmap bitmap, int topOfLine)
{
     int x;
     bool no_black_pixel;
     no_black_pixel = false;
     while (no_black_pixel == false)
     {
         topOfLine++;
         no_black_pixel = true; 
         for (x = 0; x < bitmap.Width && topOfLine < bitmap.Height; x++)
         {
              if ((Convert.ToString(bitmap.GetPixel(x, 
                           topOfLine)) == Shape.BlackPixel))
              no_black_pixel = false;
         }
     }
     return topOfLine - 1;
}

public int FindNextLine(Bitmap bitmap, int y, ref int x)
{
      if (y >= bitmap.Height)
      return -1;
      while (Convert.ToString(bitmap.GetPixel(x, y)) == Shape.WhitePixel)
      {
          x++;
          if (x == bitmap.Width)
          {
              x = 0;
              y++;
          }
          if (y >= bitmap.Height)
          {
              break;
          }
      }
      return y < bitmap.Height ? y : -1;
}

由于像“？”这样的字符被识别为两条线，我们将那些与下一条线的距离为其高度一定比例的线合并

private static List<Belt> RemoveNoisyData(List<Belt> belts)
{
   if (!Directory.Exists("temp"))
   {
        Directory.CreateDirectory("temp");
   }
   else
   {
        foreach (string file in Directory.GetFiles("temp"))
        {
              try
              {
                   //File.Delete(file);
              }
              catch
              { }
        }
  }
  for (int i = 1; i < belts.Count; i++)
  {
        if (belts[i].RelativeTop - belts[i - 1].BaseHorizontalLine - 
            belts[i - 1].RelativeTop < 
            Belt.UpAndDownWhiteSpaceRatio * belts[i].Height)
        {
              Image<Gray, Byte> img1 = new Image<Gray, byte>(belts[i].Image);
              Image<Gray, Byte> img2 = new Image<Gray, byte>(belts[i - 1].Image);
              Image<Gray, Byte> img3 = img2.ConcateVertical(img1);
              string path = @".\temp\" + System.Guid.NewGuid().ToString();
              img3.Save(path);
              belts[i - 1].Image = (Bitmap)Bitmap.FromFile(path);
              belts[i - 1].RelativeBottom = belts[i].RelativeBottom;
              belts[i - 1].BaseHorizontalLine = -1;
              belts.RemoveAt(i);
        }
  }
  return belts;
}

最终，我们将线条的图像保存到输出目录。

实验结果

我使用不同的字体和大小（包括 Mitra、TimesNewRoman、Arial 和 Zar）测试了该算法。对于没有噪声的字体，它的准确率达到 96%，但对于有噪声的样本，根据噪声比率，我们得到不同的结果，这些结果不可接受。

历史

我花了两年时间开发一个开源的波斯语/阿拉伯语 OCR，现在我想在这里分享我的一些经验。如果您对开发波斯语/阿拉伯语 OCR 感兴趣，可以加入以下群组：farsi_arabic_OCR@groups.yahoo.com。