使用 TensorFlow Lite 创建 Android AI 应用

Joel Ivory Johnson

5.00/5 (1投票)

2020年9月18日

CPOL

7分钟阅读

12514

在本文中，我们将创建一个Android应用程序并导入我们的TensorFlow Lite模型。

这是关于在Android上使用TensorFlow Lite的神经网络系列文章的第三篇。在本系列的第2部分中，我们完成了从预训练模型构建TensorFlow Lite模型。在本部分中，我们将创建一个Android应用程序并将该模型导入其中。您需要上一节中创建的.tflite文件（yolo.tflite）。

应用程序的流程如下：

选择图像进行分析。
调整图像大小以匹配所用TensorFlow Lite模型的要求。
从图像创建输入缓冲区。
实例化带有可选代理的TensorFlow Lite解释器。
1. GPU代理将在图形硬件上运行部分计算。
2. NNAPI代理（Android 8.1及更高版本）可以在GPU、DSP或神经网络处理单元（NPU）上运行。
实例化输出缓冲区。
解释器根据输入运行模型并将结果放入输出中。

应用程序的源位图可以通过多种不同方式获取：它可以从文件系统加载，从设备相机获取，从网络下载，或通过其他方式获取。只要图像可以作为位图加载，此处提供的其余代码就可以很容易地进行调整。在此示例程序中，我将让源图像来自图片选择器。

使用“空活动”模板创建一个新的Android应用程序。创建应用程序后，我们需要为应用程序进行一些配置步骤。首先，我们将添加对TensorFlow的引用，以便应用程序拥有使用TensorFlow Lite以及用于CPU和NPU的TensorFlow Lite代理所需的库。打开应用程序的build.gradle并将以下内容添加到依赖项部分：

implementation 'org.tensorflow:tensorflow-lite:0.0.0-nightly'
implementation 'org.tensorflow:tensorflow-lite-gpu:0.0.0-nightly'
implementation 'org.tensorflow:tensorflow-lite-support:0.0.0-nightly'

保存更改并构建应用程序。如果您收到关于所需NDK版本不存在的错误，请参阅本系列文章的第1部分。如果应用程序构建成功，还需要对build.gradle进行额外的更改。在Android部分，必须添加一个设置，以指示Android Studio不要压缩*.tflite*文件。在文件的android部分，添加以下行：

aaptOptions {
	noCompress "tflite"
}

*.tflite*文件将放入项目的“assets”文件夹中。新项目中不存在该文件夹。您可以在项目的*apps/src/main*中创建该文件夹。

将文件*yolo.tflite*复制到资产文件夹。

我们将从一个允许用户选择图像并显示它的应用程序开始。*activity_main.xml*文件只需要几个元素：一个用于激活图像选择器的按钮和一个图像视图。

<?xml version="1.0" encoding="utf-8"?>
<androidx.constraintlayout.widget.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
   xmlns:app="http://schemas.android.com/apk/res-auto"
   xmlns:tools="http://schemas.android.com/tools"
   android:layout_width="match_parent"
   android:layout_height="match_parent"
   tools:context=".MainActivity">


   <ImageView
       android:id="@+id/selectedImageView"
       android:layout_width="match_parent"
       android:layout_height="match_parent"
       android:layout_marginBottom="64dp"
       app:layout_constraintBottom_toBottomOf="parent"
       app:layout_constraintEnd_toEndOf="parent"
       app:layout_constraintStart_toStartOf="parent"
       app:layout_constraintTop_toTopOf="parent" />

   <Button
       android:id="@+id/selectImageButton"
       android:text="@string/button_select_image"
       android:layout_width="wrap_content"
       android:layout_height="wrap_content"
       android:layout_marginTop="8dp"
       app:layout_constraintEnd_toEndOf="parent"
       app:layout_constraintStart_toStartOf="parent"
       app:layout_constraintTop_toBottomOf="@+id/selectedImageView" />
</androidx.constraintlayout.widget.ConstraintLayout>

在文件*MainActivity.java*中，我们添加执行代码。要从该代码引用图像视图，请添加一个名为selectedImageView的字段。

ImageView selectedImageView;

在onCreate()中，在setContentView()之后添加一行，将加载的ImageView实例分配给selectedImageView。

@Override
protected void onCreate(Bundle savedInstanceState) {
   super.onCreate(savedInstanceState);
   setContentView(R.layout.activity_main);
   selectedImageView = findViewById(R.id.selectedImageView);
}

布局中定义的按钮将触发打开图像选择器的函数。

final int SELECT_PICTURE = 1;
public void onSelectImageButtonClicked(View view) {
   Intent intent = new Intent(Intent.ACTION_GET_CONTENT);
   intent.setType("image/*");
   Intent chooser = Intent.createChooser(intent, "Choose a Picture");
   startActivityForResult(chooser, SELECT_PICTURE);
}

当用户激活此功能时，系统图像选择器将打开。用户选择图像后，控制权返回给应用程序。要检索选择，活动必须实现onActivityResult()方法。所选图像的URI位于传递给此方法的数据对象中。

public void onActivityResult (int reqCode, int resultCode, Intent data) {
   super.onActivityResult(reqCode, resultCode, data);
   if (resultCode == RESULT_OK) {
       if (reqCode == SELECT_PICTURE) {
           Uri selectedUri = data.getData();
           String fileString = selectedUri.getPath();
           selectedImageView.setImageURI(selectedUri);
       }
   }
}

*activity_main.xml*中定义的按钮尚未附加到任何代码。将以下行添加到按钮的定义中：

android:onClick="onSelectImageButtonClicked"

如果您现在运行应用程序，您会看到它能够加载并显示图像。我们希望通过TensorFlow解释器传递此图像。让我们为之准备好代码。YOLO算法有各种实现。我在这里使用的YOLO算法期望将图像分为13列和13行。此网格中的每个单元格为32x32像素。输入图像将是416x416像素（13 * 32 = 416）。这些值表示在添加到*MainActivity.java*的常量中。还添加了一个常量，用于保存要加载的*.tflite*文件的名称，以及一个变量，用于保存TensorFlow Lite解释器。

final String TF_MODEL_NAME = "yolov4.tflite";
final int IMAGE_SEGMENT_ROWS = 13;
final int IMAGE_SEGMENT_COLS = 13;
final int IMAGE_SEGMENT_WIDTH = 32;
final int IMAGE_SEGMENT_HEIGHT = 32;
final int IMAGE_WIDTH = IMAGE_SEGMENT_COLS * IMAGE_SEGMENT_WIDTH; //416
final int IMAGE_HEIGHT = IMAGE_SEGMENT_ROWS * IMAGE_SEGMENT_HEIGHT; //416

Interpreter tfLiteInterpreter;

有几种调整图像大小的选项。我将使用TensorFlow ImageProcessor。ImageProcessor是用我们想要应用于图像的操作列表构建的。当给定一个TensorImage时，ImageProcessor将对图像执行这些操作，并返回一个新的TensorImage，准备好进行进一步处理。

void processImage(Bitmap sourceImage) {
   ImageProcessor imageProcessor =
           new ImageProcessor.Builder()
                   .add(new ResizeOp(IMAGE_HEIGHT, IMAGE_WIDTH, ResizeOp.ResizeMethod.BILINEAR))
                   .build();
   TensorImage tImage = new TensorImage(DataType.FLOAT32);
   tImage.load(sourceImage);
   tImage = imageProcessor.process(tImage);
   //...
}

我们需要使用我们的模型初始化TensorFlow Lite解释器，以便它可以处理此图像。Interpreter类的构造函数接受包含模型的字节缓冲区和包含要应用于Interpreter实例的选项的对象。对于选项，我们添加一个GPU代理和一个NNAPI代理。如果设备具有兼容的硬件来加速某些操作，那么当TF解释器运行时将使用该硬件。

void prepareInterpreter() throws IOException {
   if(tfLiteInterpreter == null) {
       GpuDelegate gpuDelegage = new GpuDelegate();
       Interpreter.Options options = new Interpreter.Options();
       options.addDelegate(gpuDelegage);
       //Only add the NNAPI delegate of this were build for Android P or later.
       if(Build.VERSION.SDK_INT >= Build.VERSION_CODES.P) {
           NnApiDelegate nnApiDelegate = new NnApiDelegate();
           options.addDelegate(nnApiDelegate);
       }
       MappedByteBuffer tfLiteModel = FileUtil.loadMappedFile(this, TF_MODEL_NAME);
       tfLiteInterpreter = new Interpreter(tfLiteModel, options);
   }
}

您可能还记得张量通常表示为数组。在`processImage`函数中，创建了一些缓冲区来接收网络模型的输出。

float[][][][][] buf0 = new float[1][52][52][3][85];
float[][][][][] buf1 = new float[1][26][26][3][85];
float[][][][][] buf2 = new float[1][13][13][3][85];

这些多维数组乍一看可能有点吓人。如何处理这些数组中的数据呢？

让我们关注三个数组中的最后一个。有些网络模型旨在一次处理多个感兴趣的数据集实例。您可能还记得，此算法将416x416像素的图像划分为13行和13列。数组的第二维和第三维用于图像行和列。在此网格的每个单元格中，算法最多可以检测3个边界框，用于识别在该特定网格位置内的对象。第四维的大小为3，用于每个这些边界框。最后一维用于85个元素。列表中的前四个项用于定义边界框坐标（x、y、宽度、高度）。此列表中的第五个元素是0到1之间的值，表示该框与对象匹配的置信度。接下来的80个元素是匹配项为特定对象的概率。

我在这里使用的YOLO实现最多能识别80种类型的对象。YOLO还有其他实现，它们能检测不同数量的元素。有时，您可以通过最后一个维度中元素的数量来猜测网络识别的项数。但是，不要依赖这一点。要了解哪些位置代表哪些对象，您需要查阅您正在使用的网络的文档。值的含义和解释将在本系列的下一部分中详细讨论。

这些值被打包到一个 `HashMap` 中，并传递给 TensorFlow Lite 解释器。

//...
HashMap<Integer, Object> outputBuffers = new HashMap<Integer, Object>();
outputBuffers.put(0, buf0);
outputBuffers.put(1, buf1);
outputBuffers.put(2, buf2);

tfLiteInterpreter.runForMultipleInputsOutputs(new Object[]{tImage.getBuffer()}, outputBuffers);

执行 YOLO 神经网络的行是 `runForMultipleInputsOutputs()`。当只有一个输入和输出时，将使用名为 `run()` 的函数代替。结果存储在第二个参数中传递的数组中。

网络运行并产生输出，但要使这些输出有用或有意义，我们需要知道如何解释它们。为了测试，我使用了这张图片。

深入研究其中一个输出数组，我得到了一系列数字。让我们检查前四个。

00: 0.32  01: 0.46   02: 0.71   03: 0.46

前四个数字是一个匹配的X和Y坐标，以及宽度和高度。这些值从-1缩放到1。它们必须调整到0到416才能转换为像素尺寸。

数组的其余大部分值为 0.0。对于我正在查看的结果，在位置 19 处遇到了一个非零值。

12: 0.00   13: 0.00   14: 0.00   15: 0.00
16: 0.00   17: 0.00   18: 0.00   19: 0.91
20: 0.00   21: 0.00   22: 0.00   23: 0.00

整个数组有85个元素，但本例中其余的值也为零，并被省略。从位置5开始的值是该神经网络可以识别的每种物品类别的置信度评级。如果我们将值位置减去5，我们将得到它所对应的对象类别的索引。对于将此值与对象匹配的类别列表，我们将查看位置14（19-5 = 14）。

00: person        01: bicycle       02: car          03: motorbike
04: airplane      05: bus           06: train        07: truck
08: boat          09: traffic light 10: fire hydrant 11: stop sign
12: parking meter 13: bench         14: bird         15: cat
16: dog           17: horse         18: sheep

在这种情况下，程序识别出了一只鸟。

后续步骤

既然我们已经成功地在图像上运行了模型，是时候用模型的输出做一些有趣的事情了。继续阅读下一篇文章，学习如何解释这些结果并为它们创建可视化。