使用 ResNet 在 iOS 上进行 AI 图像分类

Jarek Szczegielniak

5.00/5 (3投票s)

2020年8月28日

CPOL

5分钟阅读

10872

在上篇文章中，我们已将 ResNet 模型转换为 Core ML 格式，在本篇文章中，我们将它用于一个简单的 iOS 应用程序。

引言

深度神经网络在图像分类等任务上表现出色。曾经需要耗费数百万美元和整个研究团队十年的工作成果，现在对于任何拥有半像样 GPU 的人来说，都可以轻松获得。然而，深度神经网络也有其缺点。它们可能非常庞大且运行缓慢，因此在移动设备上并不总是运行良好。幸运的是，Core ML 提供了一种解决方案：它使您能够创建适合在 iOS 设备上运行的精简模型。

在本系列文章中，我们将向您展示两种使用 Core ML 的方法。首先，您将学习如何将预训练的图像分类器模型转换为 Core ML，并在 iOS 应用中使用它。然后，您将训练自己的机器学习 (ML) 模型，并使用它来创建一个 Not Hotdog 应用——就像您可能在 HBO 的硅谷中看到的那样。

在上篇文章中，我们已将 ResNet 模型转换为 Core ML 格式，现在我们将它用于一个简单的 iOS 应用程序。

设置您的示例应用程序

为了专注于我们当前的主要任务——展示转换后的 ResNet 模型的使用——我们将“借用”苹果开发者网站上示例图像分类应用。当您在 Xcode 中打开下载的应用项目时，会显示一个简短而“切题”的说明。

请注意——这个说明可能会解答您不少问题。要在 iOS 设备上运行示例应用，您需要完成设置团队和唯一捆绑包标识符的常规步骤。我们建议您在真实设备上运行该应用，以便能够使用设备相机。

示例应用程序在 (ImageClassificationViewController) 中有三个主要方法来处理 ML 处理。

设置模型

模型设置并分配给懒惰初始化的 classificationRequest 变量

   lazy var classificationRequest: VNCoreMLRequest = {
        do {
            /*
             Use the Swift class `MobileNet` Core ML generates from the model.
             To use a different Core ML classifier model, add it to the project
             and replace `MobileNet` with that model's generated Swift class.
             */
            let model = try VNCoreMLModel(for: MobileNet().model)
            
            let request = VNCoreMLRequest(model: model, completionHandler: { [weak self] request, error in
                self?.processClassifications(for: request, error: error)
            })
            request.imageCropAndScaleOption = .centerCrop
            return request
        } catch {
            fatalError("Failed to load Vision ML model: \(error)")
        }
    }()

上面代码片段中最重要的行是模型分配 (let model = (…))。在许多情况下，这是在切换到不同模型时唯一需要更新的行。

请注意类名中的 VN 前缀。它表示这些类是 Vision 框架的一部分。该框架提供了一个高级 API，用于处理计算机视觉任务，如人脸和身体检测、矩形检测、身体和手部姿势检测、文本检测等。除了这些内部使用苹果创建的模型的高级 API 之外，Vision 框架还公开了一个 API，当使用自定义 Core ML 模型进行 ML 图像分析时非常方便。

虽然您可以直接使用 Core ML，但 Vision 层可以免去诸如图像缩放和裁剪、颜色空间和方向转换等琐碎任务的负担。

在我们的示例应用程序中，一行代码即可处理所有必需的任务

request.imageCropAndScaleOption = .centerCrop

每次模型分类完成时，都会调用 processClassifications 方法来相应地更新 UI。

在您的应用程序中处理分类请求

下一个方法 updateClassifications 由其他应用程序组件调用，以启动图像分类。

   func updateClassifications(for image: UIImage) {
        classificationLabel.text = "Classifying..."
        
        let orientation = CGImagePropertyOrientation(image.imageOrientation)
        guard let ciImage = CIImage(image: image) else { fatalError("Unable to create \(CIImage.self) from \(image).") }
        
        DispatchQueue.global(qos: .userInitiated).async {
            let handler = VNImageRequestHandler(ciImage: ciImage, orientation: orientation)
            do {
                try handler.perform([self.classificationRequest])
            } catch {
                /*
                 This handler catches general image processing errors. The `classificationRequest`'s
                 completion handler `processClassifications(_:error:)` catches errors specific
                 to processing that request.
                 */
                print("Failed to perform classification.\n\(error.localizedDescription)")
            }
        }
    }

此方法接受一个参数 image，并在内部以线程安全的方式调用先前配置的 classificationRequest。

显示分类结果

最后一个“主要”方法负责使用分类结果更新 UI。

   func processClassifications(for request: VNRequest, error: Error?) {
        DispatchQueue.main.async {
            guard let results = request.results else {
                self.classificationLabel.text = "Unable to classify image.\n\(error!.localizedDescription)"
                return
            }
            // The `results` will always be `VNClassificationObservation`s, as specified by the Core ML model in this project.
            let classifications = results as! [VNClassificationObservation]
        
            if classifications.isEmpty {
                self.classificationLabel.text = "Nothing recognized."
            } else {
                // Display top classifications ranked by confidence in the UI.
                let topClassifications = classifications.prefix(2)
                let descriptions = topClassifications.map { classification in
                    // Formats the classification for display; e.g. "(0.37) cliff, drop, drop-off".
                   return String(format: "  (%.2f) %@", classification.confidence, classification.identifier)
                }
                self.classificationLabel.text = "Classification:\n" + descriptions.joined(separator: "\n")
            }
        }
    }

此方法显示具有最高模型置信度的前两个预测标签 (let topClassifications = classifications.prefix(2))。

其余方法处理相机和拍摄的照片。它们不是 ML 特定的。

检查 MobileNet 模型

如果单击资源管理器中的 MobileNet.mlmodel 文件，您可以检查模型详细信息。

除了输入和输出定义之外，还提供了大量的元数据：作者、详细描述和许可证。

将模型添加到应用程序

现在是时候将我们转换后的 ResNet 模型添加到项目中了。最简单的方法是从 Finder 中拖动它并将其放入 Xcode 的资源管理器中。请记住，这只会将模型链接到应用程序；模型不会物理复制到项目文件夹。如果您想将新模型与应用程序的其他部分一起保留，则需要在手动链接之前将其手动复制到那里。

完成此步骤后，您可以查看 ResNet 模型描述。

在我们的例子中，只指定了名称、类型、大小、输入和输出。如果您考虑分发模型，则应考虑用有意义的信息填充这些字段。这可以使用 coremltools Python 库来完成。

使用转换后的 ResNet 模型运行您的应用

要使用您拖放到 Xcode 项目中的转换后的模型，我们需要更改 ImageClassificationViewController.swift 文件中的一行代码。

因为在转换过程中我们选择 "13" 作为最低 iOS 版本，所以您需要相应地更改目标平台设置。

进行上述更改后，您可以立即使用 ResNet 模型运行预测。

MobileNet 和 ResNet 模型之间的一个明显区别是：MobileNet 返回带有置信度概率的标签（得益于 softmax 层），而 ResNet 返回“原始”的、未缩放的神经网络输出。如果需要，可以通过向 ResNet 模型添加自定义层，或在应用程序中计算返回结果的 softmax 来解决此问题。

摘要

现在我们有了一个顺利运行我们转换后的 ResNet 图像分类模型的示例应用程序。它证明了 iOS 13 设备不仅能够成功运行经过精简的“移动”ML 模型，还能运行原始（大型）模型。

看起来我们可以在 iOS 应用中使用任何图像分类模型（包括转换后的模型）。为了实现本系列的最终目标，我们现在只需要一个能够检测热狗的模型。MobileNet 和 ResNet 模型都可以检测热狗，但这里有趣的任务是理解它们如何做到这一点。在下一篇文章中，我们将开始为这个新的自定义模型准备数据，以便稍后使用 Create ML 框架进行训练。