基于 Tesseract 的 Android OCR 应用程序

SergVoloshyn

5.00/5 (7投票s)

2019 年 1 月 28 日

CPOL

2分钟阅读

75549

6354

轻松制作 Android OCR 应用程序的方法

下载源代码

引言

这个应用使用了 Tesseract 3 的 Tesseract OCR 引擎，它通过识别字符模式来工作 (https://github.com/tesseract-ocr/tesseract)。Tesseract 支持 Unicode (UTF-8)，并且“开箱即用”就能识别超过 100 种语言。

背景

我尝试过 Google Text Recognition API - https://developers.google.com/vision/android/text-overview，但它不适合我，所以我发现了这个令人惊叹的引擎。

Using the Code

让我们开始吧！在 Android Studio 中创建一个新项目（我使用的是版本 3.2.1），或者你可以下载源代码并选择：文件-新建-导入项目。

在 build.gradle 应用级别添加

implementation 'com.jakewharton:butterknife:8.8.1'
annotationProcessor 'com.jakewharton:butterknife-compiler:8.8.1'

implementation 'com.rmtheis:tess-two:9.0.0'

我使用了 Butterknife 库，它非常有用，主要的库是 - 'tess-two:9.0.0'' - 它包含一个为 Android 提供的 Tesseract 工具的 fork (tesseract-android-tools)，它添加了一些额外的功能。此外，我们需要相机和写入权限，所以将其添加到 AndroidManifest.xml 中。

<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
<uses-feature android:name="android.hardware.camera" />
<uses-permission android:name="android.permission.CAMERA" />

创建一个简单的布局文件，包含 Button、TextView 和 ImageView

<?xml version="1.0" encoding="utf-8"?>
<ScrollView xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="wrap_content"
    android:fillViewport="true"
    tools:context=".MainActivity">

    <LinearLayout
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:orientation="vertical">

        <LinearLayout
            android:layout_width="match_parent"
            android:layout_height="wrap_content"
            android:orientation="vertical">

            <Button
                android:id="@+id/scan_button"
                android:layout_width="wrap_content"
                android:layout_height="wrap_content"
                android:layout_gravity="center"
                android:text="scan" />
        </LinearLayout>

        <LinearLayout
            android:layout_width="match_parent"
            android:layout_height="wrap_content"
            android:layout_margin="4dp"
            android:orientation="horizontal">

            <TextView
                android:id="@+id/ocr_text"
                android:layout_width="match_parent"
                android:layout_height="wrap_content"
                android:layout_gravity="fill"
                android:text=" text">

            </TextView>

        </LinearLayout>

        <LinearLayout
            android:layout_width="match_parent"
            android:layout_height="wrap_content"
            android:orientation="vertical">

            <ImageView
                android:id="@+id/ocr_image"
                android:layout_width="match_parent"
                android:layout_height="wrap_content" />

        </LinearLayout>

    </LinearLayout>
</ScrollView>

我们得到如下所示的结果

编写一些代码来检查权限

void checkPermissions() {
    if (!hasPermissions(context, PERMISSIONS)) {
        requestPermissions(PERMISSIONS,
                PERMISSION_ALL);
        flagPermissions = false;
    }
    flagPermissions = true;
}

public static boolean hasPermissions(Context context, String... permissions) {
    if (context != null && permissions != null) {
        for (String permission : permissions) {
            if (ActivityCompat.checkSelfPermission(context, permission) 
                                       != PackageManager.PERMISSION_GRANTED) {
                return false;
            }
        }
    }
    return true;
}

以及创建文件的代码

public File createImageFile() throws IOException {
    // Create an image file name
    String timeStamp = new SimpleDateFormat("MMdd_HHmmss").format(new Date());
    String imageFileName = "JPEG_" + timeStamp + "_";
    File storageDir = context.getExternalFilesDir(Environment.DIRECTORY_PICTURES);
    File image = File.createTempFile(
            imageFileName,  /* prefix */
            ".jpg",         /* suffix */
            storageDir      /* directory */
    );
    // Save a file: path for use with ACTION_VIEW intents
    mCurrentPhotoPath = image.getAbsolutePath();
    return image;
}

首先，我们需要编写 onClickScanButton 函数，它

@OnClick(R.id.scan_button)
void onClickScanButton() {
    // check permissions
    if (!flagPermissions) {
        checkPermissions();
        return;
    }
    //prepare intent
    Intent takePictureIntent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE);

    if (takePictureIntent.resolveActivity(context.getPackageManager()) != null) {
        File photoFile = null;
        try {
            photoFile = createImageFile();
        } catch (IOException ex) {
            Toast.makeText(context, errorFileCreate, Toast.LENGTH_SHORT).show();
            Log.i("File error", ex.toString());
        }
        // Continue only if the File was successfully created
        if (photoFile != null) {
            oldPhotoURI = photoURI1;
            photoURI1 = Uri.fromFile(photoFile);
            takePictureIntent.putExtra(MediaStore.EXTRA_OUTPUT, photoURI1);
            startActivityForResult(takePictureIntent, REQUEST_IMAGE1_CAPTURE);
        }
    }
}

我们可以在这里检查结果

@Override
protected void onActivityResult(int requestCode, int resultCode, @Nullable Intent data) {
    super.onActivityResult(requestCode, resultCode, data);

    switch (requestCode) {
        case REQUEST_IMAGE1_CAPTURE: {
            if (resultCode == RESULT_OK) {
                Bitmap bmp = null;
                try {
                    InputStream is = context.getContentResolver().openInputStream(photoURI1);
                    BitmapFactory.Options options = new BitmapFactory.Options();
                    bmp = BitmapFactory.decodeStream(is, null, options);

                } catch (Exception ex) {
                    Log.i(getClass().getSimpleName(), ex.getMessage());
                    Toast.makeText(context, errorConvert, Toast.LENGTH_SHORT).show();
                }

                firstImage.setImageBitmap(bmp);
                doOCR(bmp);

                OutputStream os;
                try {
                    os = new FileOutputStream(photoURI1.getPath());
                    if (bmp != null) {
                        bmp.compress(Bitmap.CompressFormat.JPEG, 100, os);
                    }
                    os.flush();
                    os.close();
                } catch (Exception ex) {
                    Log.e(getClass().getSimpleName(), ex.getMessage());
                    Toast.makeText(context, errorFileCreate, Toast.LENGTH_SHORT).show();
                }

            } else {
                {
                    photoURI1 = oldPhotoURI;
                    firstImage.setImageURI(photoURI1);
                }
            }
        }
    }
}

接下来将 Tesseract 集成到我们的项目中，创建一个额外的类：TesseractOCR。

我将训练数据文件 "eng.traineddata"（用于英语）放在 Assets 文件夹中，所以我们需要将其从 APK 复制到内部存储文件目录，然后初始化 Tesseract 系统：mTess.init(dstInitPathDir, language)。

public class TesseractOCR {

    private final TessBaseAPI mTess;

    public TesseractOCR(Context context, String language) {
        mTess = new TessBaseAPI();
        boolean fileExistFlag = false;

        AssetManager assetManager = context.getAssets();

        String dstPathDir = "/tesseract/tessdata/";

        String srcFile = "eng.traineddata";
        InputStream inFile = null;

        dstPathDir = context.getFilesDir() + dstPathDir;
        String dstInitPathDir = context.getFilesDir() + "/tesseract";
        String dstPathFile = dstPathDir + srcFile;
        FileOutputStream outFile = null;

        try {
            inFile = assetManager.open(srcFile);

            File f = new File(dstPathDir);

            if (!f.exists()) {
                if (!f.mkdirs()) {
                    Toast.makeText(context, srcFile + " can't be created.", Toast.LENGTH_SHORT).show();
                }
                outFile = new FileOutputStream(new File(dstPathFile));
            } else {
                fileExistFlag = true;
            }

        } catch (Exception ex) {
            Log.e(TAG, ex.getMessage());

        } finally {

            if (fileExistFlag) {
                try {
                    if (inFile != null) inFile.close();
                    mTess.init(dstInitPathDir, language);
                    return;

                } catch (Exception ex) {
                    Log.e(TAG, ex.getMessage());
                }
            }

            if (inFile != null && outFile != null) {
                try {
                    //copy file
                    byte[] buf = new byte[1024];
                    int len;
                    while ((len = inFile.read(buf)) != -1) {
                        outFile.write(buf, 0, len);
                    }
                    inFile.close();
                    outFile.close();
                    mTess.init(dstInitPathDir, language);
                } catch (Exception ex) {
                    Log.e(TAG, ex.getMessage());
                }
            } else {
                Toast.makeText(context, srcFile + " can't be read.", Toast.LENGTH_SHORT).show();
            }
        }
    }

    public String getOCRResult(Bitmap bitmap) {
        mTess.setImage(bitmap);
        return mTess.getUTF8Text();
    }

    public void onDestroy() {
        if (mTess != null) mTess.end();
    }
}

OCR 代码很简单 - 我们需要将图像（BMP 位图）传递给这个对象并获取结果

public String getOCRResult(Bitmap bitmap) { 
mTess.setImage(bitmap); 
return mTess.getUTF8Text(); }

OCR 可能需要很长时间，所以我们需要在另一个 Thread 中执行它

private void doOCR(final Bitmap bitmap) {
    if (mProgressDialog == null) {
        mProgressDialog = ProgressDialog.show(this, "Processing",
                "Doing OCR...", true);
    } else {
        mProgressDialog.show();
    }
    new Thread(new Runnable() {
        public void run() {
            final String srcText = mTessOCR.getOCRResult(bitmap);
            runOnUiThread(new Runnable() {
                @Override
                public void run() {

                    if (srcText != null && !srcText.equals("")) {
                        ocrText.setText(srcText);
                    }
                    mProgressDialog.dismiss();
                }
            });
        }
    }).start();
}

源图像如下所示

OCR 的结果如下所示

关注点

如果您有兴趣使用 Tesseract OCR 引擎，我希望这篇文章能帮助您。因此，您可以轻松地改进这个应用程序。我喜欢开发应用程序，所以您可以尝试我在上的一些应用程序https://play.google.com/store/apps/developer?id=VOLOSHYN+SERGIY